You can also use the cache: google operator instead of the site: operator if you know the exact link.
example
cache:http://www.candlepowerforums.com/vb/...d.php?t=304095 Just one I happened to have bookmarked. It won't show the style sheet or css for vbulletin's theme, but it is a start for some who may need post content.
Hope this helps
Here are mine:
SureFire E2L AA Outdoorsman Review: http://webcache.googleusercontent.co...www.google.com
Maglite XL50 Review: http://webcache.googleusercontent.co...www.google.com
Streamlight PT 1AA and 2AA Review: http://webcache.googleusercontent.co...www.google.com
Streamlight Stinger DS LED HP Review: http://webcache.googleusercontent.co...www.google.com
I can't seem to get the google cache to work for me, it finds the thread but when I go to cache it says the document cannot be found. And the above links from others aren't showing properly. What am I doing wrong?
The only thread of mine of any significance (well, to me at least)
http://webcache.googleusercontent.com...www.google.com.au
"The Universe is big, there's lots of stuff to think about" - Dr. Karl Kruszelnicki
All the threads I'm looking up in Google cache have only the first page/30 posts; the cache for my cars thread is from November 2009 - that thread had just under 800 posts in it
![]()
Last edited by StarHalo; 02-27-2011 at 11:43 PM.
It does make me wonder, how bad would it be to draw a line in the sand and re-boot the entire forum?
I realise that there's a LOT of very good info contained within CPF, but couldn't the old forum be kept as an archive, for reference only?
This would give the benefit of a clean start, on a new server and all the problems erased.
It would, sadly, negate all of Greta's hard work over the last few weeks...
"The Universe is big, there's lots of stuff to think about" - Dr. Karl Kruszelnicki
My only thread that I would like to resurrect is
http://webcache.googleusercontent.co....google.com.au
Here's one I would hate to lose - only find three pages though:
http://webcache.googleusercontent.com/search?q=cache:w3vB1cxKPgUJ:www.candlepowerforums. com/vb/showthread.php%3Ft%3D303913%26page%3D3+candlepower forums+meet+devil's+dyke+lanyard&cd=1&hl=en&ct=cln k&gl=uk&source=www.google.co.uk
http://webcache.googleusercontent.com/search?q=cache:5GM0BO-jie8J:www.candlepowerforums.com/vb/showthread.php%3Ft%3D303913%26page%3D4+candlepower forums+meet+devil's+dyke&cd=1&hl=en&ct=clnk&gl=uk& source=www.google.co.uk
http://webcache.googleusercontent.com/search?q=cache:PWVFr6ApL8cJ:www.candlepowerforums. com/vb/showthread.php%3Ft%3D303913%26page%3D1+candlepower forums+meet+devil's+dyke+GOLF&cd=2&hl=en&ct=clnk&g l=uk&source=www.google.co.uk
Oh, and this one, just for laughs:
http://webcache.googleusercontent.co...w.google.co.uk
This runtime test:
http://webcache.googleusercontent.co...w.google.co.uk
Long thread on a new Fenix purchase:
http://webcache.googleusercontent.co...w.google.co.uk
I'll look for more as well. Do we need to search out all pages, and do we need bother about threads started before Nov last but updated to this day?
Good luck Greta, much appreciated.
Last edited by Nyctophiliac; 02-28-2011 at 03:15 AM.
Hi Greta
I am lost without this site ... I'm on it whenever "she who must be obeyed" is watching her soaps on TV.
On March 1st , I will be 73 and I will willingly give up my slice of birthday cake just to be able to go back on CPF.
I wish you success in restoring whatever you can for all the avid readers on CPF.
At my age , my short term memory isn't great , so if I have to read all the latest re-posts , it will be OK.
Thank you for all the work that you have done for us all ... It is very much appreciated by us all.
Best of Luck to you in this our darkest hour/day/week (delete the inapplicable) ... This is me trying to make a joke out of a serious situation.
Kind Regards
XXX
.
Show your Solarforce Thread:
http://webcache.googleusercontent.co...www.google.com
Li-ion Battery Comparison:
http://webcache.googleusercontent.co...www.google.com
There was only one thread that I started that I think has any persistent value:
http://webcache.googleusercontent.co...www.google.com
As I am typically only a reader on CPF-Marketplace, this is, sadly, my first post here! I was moderately active 100-200 posts on the main CPF. If there is anything *at all* additionally I can do to assist. I will find the time.
Emitter Index | The Bright Side Forums | Illumination Supply
Veleno Designs Quantum DD | Join Us on Facebook! | Nichia 92 CRI 4500K!!
Please contact me through my website for sales related questions.
I've got the vast majority of my own threads, as well as some that I've found particularly interesting, archived on my home network. I'd need to know how to send you some files, Greta, by snail mail or perhaps uploading to a share of your choosing... I could zip the files into an archive to make it as simple as possible... if you want/need that, PM or email me with info on where/how to get the files to you.
Regarding ccshih's thoughts on a possible forum reboot, this could be nice if handled properly. (I don't presume to know the perfect way to implement it.) At the moment, my thinking is, the more archived info that remains accessible and searchable, the better. That said, it might simplify things quite a bit if threads not having seen a new post in at least a year became read-only... they would become a wiki of sorts, for reference purposes, while not bogging down the mainline database any longer. Moreover, they could be cached by the web server as static pages, reducing access times dramatically. Folks could still link to them or even include quotations when referencing the material in a newer thread.
I'm sure there are more dynamics than the above to this issue, but maybe it at least starts some discussion that ultimately leads to the best possible system for users, the server, and of course, Greta herself. Hope it helps!
I wonder if HTTrack would help out here. I know I've used it in the past for backing up a website for personal use, when I was having internet problems. Don't bother with downloading the images, or external links, otherwise it will take forever, and try downloading the whole internet.
I can set my extra computer up to download/backup CPFMP if you want, Greta, in case that ever happens here (though you lose much less than with CPF proper, if that happens)
http://www.httrack.com/
Maybe you can set it up to download all CPF cached pages from Google. Heck, it's at least worth a try!
~Brian
2 questions:
- Why are there no nightly backups to restore from? Where I work we back up to tape and then store offsite.
- If no nightly backups were enabled then will they be enabled from the present?
The volume the website goes through mandates the server be based in a colo center & there isn't people just there to physically handle back-ups. Any back-ups would likely be handled via cloud style data storage so it is off site & be auto-mated. I wonder what size the back-up file would be.
Various Neutral Tinted Goodness.
Greta, so sorry to hear of your troubles. Just like you to know how much all your work for this forum is appreciated.
Two comments, one short and one long:
First, let's stop pestering Greta with who did what and why questions about the site and backup. Speculation and recriminations aren't helpful, and neither are suggestions unless suggestions for off site backup and fail over systems and the like are also met with large monthly donations.
Second... recovering entire threads intact from Google cache data is too much to ask for.
I'm not one who knows "just a little bit" about this, I've actually done recovery work for a vBulletin forum using Google cache data as the source for **partially** rebuilding the vB database including restoring thread structure, posts, and some limited user data.
It sounds marvellous -- rebuild from cache -- but it is hugely tedious work to grab all the data from cache and virtually impossible to gather it all and get it perfectly right, and requires that tons of cache data be collected. Past experiments I've done in automating the process by spidering Google's cache have resulted in the IP addresses I was using for the test being blocked by Google for a period of time. One could rate limit a spider and maybe get away with trawling the Google cache but given the large number of pages that would need to be retrieved it seems unlikely this activity will miss Google's attention.
Google cache isn't a single entity either; different results are returned from different Google datacenters. If Greta and I both retrieve the cache for a certain page it is possible and even likely that each of us will be returned data from different cache dates.
Another reality must be faced: the cache data won't live forever. I don't know Google's policy or algorithm on when to disappear the data; chances are that the clock isn't ticking for the majority of pages as the site currently returns a 503 Service Unavailable status to any HTTP request. While this remains so, Google will tend to ignore the site, assuming a resolveable problem is being worked on but if it does persist eventually the cache for the site will be one big blank. Don't panic about this.
Panic about this instead: once the backup version of the site goes live and HTTP requests (your browser views, Google's cache gathering spiders) to CPF return a situation-normal 200 OK status, eventually Google will conclude that CPF is now really does look like whatever it was as of November 1 2010. It will not know or care that the site has been rebooted from an old backup. Thus cache links for pages more recent than Nov 1 2010 which now point to nowhere (going forward) will start to disappear. This will likely start to happen within days or just a couple weeks at most of the new-old CPF coming back on line.
As nice as Google cache is, it is not a complete copy of the entire site, least not in one easy to obtain chunk of data.
More info on Google cache:
http://www.google.com/intl/en/help/f...st.html#cached
http://www.googleguide.com/cached_pages.html
Here's the format for creating a query from link rather than in the address bar of your browser:
http://www.google.com/search?q=cache:TARGET_URL
Some examples, i.e. Batteries Included forum:
- http://www.google.com/search?q=cache...isplay.php?f=9
- Returns a page with cache date of: 27 Feb 2011 13:10:31 GMT
A thread from within the forum, which shows up on the subforum page as having 43 replies as of that date:
- http://www.google.com/search?q=cache...d.php?t=308940
- Returns a page (page one of 2, 32 posts total) with cache date of: 23 Feb 2011 14:37:46 GMT
Just for that one smallish thread alone a bunch of posts are missing. Maybe another Google datacenter will return a different - newer - version. Maybe not. It isn't a perfect tool.
There is no magic bullet for automatically recovering thousands of lost threads and tens of thousands of lost posts. I have been there, done that, and that was with a corpus of information provided to me, gathered by hand, by that forum site's owner. Parsing out the thread, post and user/author information wasn't hugely difficult - hundreds of lines of code not tens of thousands - but in the end it really wasn't worth it due to the big gaps in the data provided and that was for a forum site orders of magnitude less complex and less active than CPF is.
The bulk of the work involved something along the order of:
- stripping the Google component of the HTML in the cache pages
- parsing the remaining HTML to extract thread (thread name, thread id) information and individual posts (post titles, post ids, author, content, dates, and the like)
- parsing the individual posts with a relaxed HTML processor and converting the HTML within to BBCODE/vBCODE, the markup language we use when writing posts (i.e. smileys, bold, italics, quote tags and such), writing this all out to a database to be used later in the re-import process
- extracting username data (username, userid links/ids) and using this to compare against the user DB as of the last backup; some ids were recreated to preserve author-thread-post relationships, some were unavoidably lost given the cache data was incomplete
- marking certain posts and threads extracted as duplicates (site search data was intermixed within the cache data returned)
- running some sanity check code; creating a copy of the entire site DB and running test imports against the copy until a clean import could be had
Possibly I've left a few things out. As I said this site was much less complex and less involved than CPF. I'm looking at my own code and I don't like what I see, but that's not unusual.
Is trying to automate the recreation of lost threads and replies worth it? Having gone down this road for a similar but much smaller case, I don't think so. Unfortunately it takes just as much work effort to recover the mundane as it takes to recover true gems.
In my opinion if there are nuggets of gold that really need to be recovered, users with a vested interest in retrieving those nuggets should drive the recovery effort by obtaining as much as they can from Google's cache, asap, storing it local to them and reposting (or passing on to the original author to repost, or posting with proper attribution) when CPF is again operational. Those motivated should drive their own bus in short. This requires no hand-holding from Greta, no special setup, no mammoth programming and sys admin exercise, and no extended downtime for the site.
Sure, some content will be lost forever but new content will take its place and the community can get on with life again. And life will go on.
Looking at the bright side, I'll get to celebrate my 1000'th post all over again.![]()
Last edited by tandem; 02-28-2011 at 10:38 AM. Reason: typos
Here are a few of my threads from Google Catch.
I am into short-arc now. Leds & HIDs etc. what's past is past, letting them go...
Good luck for recovery! Didn't realize how important CPF is until it's gone
Moon Blaster:
http://webcache.googleusercontent.co...www.google.com
Mega Blaster:
http://webcache.googleusercontent.co...www.google.com
CA-230: 150W Xenon Short-Arc Searchlight
http://webcache.googleusercontent.co...www.google.com
Mini Barn Burner, only because I have a short-arc mod coming for this host:
http://webcache.googleusercontent.co...www.google.com
Last edited by ma_sha1; 02-28-2011 at 06:49 PM.
Thank you tandem for that very informative post. It was very helpful in showing those of us who don't know what's going on the magnitude of the problem and the difficulty of repairing it. I had no idea.
Just do your best Greta, as has been stated already. The collective brain that is CPF will little by little rebuild whatever is lost and make it work. Back in October, all those posts were not even in existence yet, and we survived, right? Plus, now we get to be excited about new XML lights and the HDS Rotary all over again! It's kinda like a time machine.It'll be ok. At least it's a flashlight forum, and not a database of the latest knowledge on cancer research or neurosurgery or something. Could be a lot worse.
![]()
While it's a shame to loose 3 months of posts all is not lost. I'd move on for now, get the site back up and create a forum called "The 11/2010 lost posts" forum or something like that where people can try and recreate the missing posts. They can then be moved to the correct forum. Not ideal but maybe the best we can do for now. Stuff like this happens.
Fenix P1, L1S, L2T, L0Pse, Inova 24/7, XO3, X1(old), & Radiant 2C & AAA, Cyclops 15MCP, Streamlight ProPolymer 4-AA 7LED,
MAG 4D&5D,AA Conv. to LED, Dorcy Super 1W, 5MCP, MetalGear, & AAA LED, Task Force 2C 3W ... and soon to include more.
Is he sayin' we all belong on the short bus?Those motivated should drive their own bus in short.
I just want to add something.
I've never quite understood having multiple forums running for CPF and its sister sites, CPFMP, CPFG, CPFP. It creates confusion for new users and some discussion threads are created in the wrong forum so finding the info is sometimes a little tedious having to search multiple forums. Would it not be better to just have one forum to host it all? Restrictions and moderation can be adjusted on a sub-forum level. The domain names can be set to forward to the appropriate sub-forum using a simple redirect page. Since CPFMP is up and running, you can piggy back off of it to start CPF and the other forums relatively quickly and easily. Some CPF users are already registered here and is familiar with it.
Something all of us can do, even if we have no technical skills, is to help defray the cost of getting CPF up and running again. I'm sure whatever avenues are being pursued to correct this are probably not cheap. New hardware and IT services mean $$$. If 1,000 of us brewed our own coffee and ate a Kashi bar today rather than going to Starbucks for a latte and a scone and sent that $5 to Greta, that's $5,000 just like that. And likely, it didn't hurt us much. I would encourage all members to renew their subscriptions, even if it's just a few dollars. I'm confident it will go to good use in dealing with The Great Crash.
I know it's not much, but I hope my subscription renewal is helpful Greta.![]()
BTW selfbuilt: If you ever need it.. Free webhosting. (doesn't include a domain, though.)
Might be good to have a backup of sorts.
Emitter Index | The Bright Side Forums | Illumination Supply
Veleno Designs Quantum DD | Join Us on Facebook! | Nichia 92 CRI 4500K!!
Please contact me through my website for sales related questions.
Bookmarks