menos I M6 Posted May 23, 2016 Share #41 Posted May 23, 2016 Advertisement (gone after registration) About digital degradation... I have approximately 60k files in LR repository and I have noticed only 2 files (DNG) that OS could not read after some times. The corruption was detected during backup when copy failed for those two files. I don't have any explanation why it happened but now I have became more careful about taking original backup and doing periodic exports (processed jpegs) that are stored separately. Even then I feel better about digital longevity vs film. Once I lived for year and half in a humid place and noticed my slides and negatives degrading due to growing fungus. I had to send them to non humid place to stop fungus growth. Digital, just like any other media, needs appropriate precautions. Jayant, you were lucky to accidentally find out about these two files. In my opinion the very minimum for everyone serious about data verification is to do checksum tests of all archive files. Here is a neat little Lightroom plugin, which can be used very straight forward and although it is not without flaws it does make a first attempt at data verification of Lightroom managed files very easy: http://bayimages.net/blog/lightroom/validator/ I use on top of this lightroom plugin this standalone software: http://diglloydtools.com/integritychecker.html It demands some manual work to verify your files and both solutions will need a certain amount of time to complete to check a full archive (I basically run partial checks over night when scheduled). The most important part about data protection is to run as many backups as you can manage / afford / can be bothered with and … here is the important part … spread those backups over as long a time frame as you can get away with. This allows to recover broken files that have spread already into your backups from older backups. There is no use to run five consecutive backups on the same day, but rather run three backups spread over a much longer period of time. Link to post Share on other sites More sharing options...
Advertisement Posted May 23, 2016 Posted May 23, 2016 Hi menos I M6, Take a look here DNG compressed VS uncompressed?. I'm sure you'll find what you were looking for!
jmahto Posted May 23, 2016 Share #42 Posted May 23, 2016 Jayant, you were lucky to accidentally find out about these two files. In my opinion the very minimum for everyone serious about data verification is to do checksum tests of all archive files. Here is a neat little Lightroom plugin, which can be used very straight forward and although it is not without flaws it does make a first attempt at data verification of Lightroom managed files very easy: http://bayimages.net/blog/lightroom/validator/ I use on top of this lightroom plugin this standalone software: http://diglloydtools.com/integritychecker.html It demands some manual work to verify your files and both solutions will need a certain amount of time to complete to check a full archive (I basically run partial checks over night when scheduled). The most important part about data protection is to run as many backups as you can manage / afford / can be bothered with and … here is the important part … spread those backups over as long a time frame as you can get away with. This allows to recover broken files that have spread already into your backups from older backups. There is no use to run five consecutive backups on the same day, but rather run three backups spread over a much longer period of time. Thanks... I will check out the plug-in. However, I have a question. Isn't OS not be able to read file (as in my case) is definite sign of corruption? Or is there a case where files may be corrupted but OS will be able to copy the files correctly? For backup I use NovaBACKUP in file copy mode. I totally agree about spaced backups on different devices so that overwrites don't happen. I also export processed jpegs and keep backup, in case the LR repository gets lost/corrupted beyond recovery. Link to post Share on other sites More sharing options...
menos I M6 Posted May 23, 2016 Share #43 Posted May 23, 2016 Jayant, the corruption that happens most of the time just flips a bit or a few - this is most often too small a corruption to damage an image file so severe as to it not being readable any longer. This means unfortunately that even file management software as Lightroom being able to access and render a preview from such a damaged file is any indicator of a file being safe or not. A file checksum check is the exclusive safe measure to find file corruption on image files. Unfortunately these checks do need time, CPU cycles and heavy drive use (every single file has to be accessed, it's checksum computed and compared to a known checksum), hence time consuming with large archives. The issue here is that one wants to know about file corruption as soon as possible as in many cases corrupted files are a first sign of a deteriorating hard drive, which needs urgent replacement. Other measures to prevent such dramatic failure are: - exclusively using industrial grade hard drives opposed to the cheap drives that are discounted at only a fraction of the price (magnetic hard drives in many old Mac computers interestingly were NOT industrial grade and got immediately replaced upon purchase of such a machine) - SSD drives have similar grades with high end drives having very large parts of their actual drive space reserved for correction - initial heavy stresstest and scan hard drives upon purchase BEFORE using them actively for important data - regularly doing hard drive scans (there are many different software solutions to do that, one of the most used ones is DiskWarrior for Mac (a must have and must regular use software for any Mac user) - replace hard drives on a regular schedule (on my daily worker, always a MacbookPro, I ran replaceable internal magnetic hard drives for max 1-2 years, now with non replaceable SSDs, I do more extensive scans and limit professional use of the machine to max 3 years) - external magnetic drives which are in regular use (archive, data drives) have a max live before replacement of 2 years, after which they go into the backup drive cycle (with extensive stress testing before and during use) - external SSD drives I use exclusively as perishable mediums as of their high risk of total failure (think transport unimportant data from one machine to another but never as backup drive or user drive for important data) - use ECC ram when your computer allows - regularly do the Apple hard ware test routine (which will also test RAM memory banks) - most important: if any drive ever acts up, it will be immediately replaced and destroyed if deemed unsafe (if a stresstest and hard ware scan deems it repairable, it lands in the unsafe for important data junkyard) - I also completely prevent buying from brands again that have had a complete drive failure before (such as some of the fancy design brands, pushed in Apple shops) - I have had exclusively great experience with internal and external drives from OWC, WD and Seagate For all these regular checkups I run repeatable iCal reminders (together with battery refresh schedule reminders for mobile devices, …). I use SuperDuper in combination with saved profiles and scripts with all drives. Link to post Share on other sites More sharing options...
Adam Posted May 23, 2016 Share #44 Posted May 23, 2016 Thanks for sharing this info. But May I ask did you compare the level of degradation before and after you took all/some of the measures you are suggesting? I am just wondering what improvement in file degradation are we talking about, both in relative and absolute terms. Link to post Share on other sites More sharing options...
colonel Posted May 23, 2016 Share #45 Posted May 23, 2016 ditto on backup from menos I store my photos: 1. On my main PC 2. On a USB backup drive 3. On a NAS 4. On a USB drive off-site (in my brother's house) 5. Best photos uploaded to flickr (mostly on private or friends and family setting only) Latterly I am also: 6. exploring dropbox for raw backup ... 7. running/building a raspberry pi 3 backup server which I want to backup on a local hard disk attached to it and also by background agent farming 24/6 to remote backups (flickr, dropbox and eventually the harddisk at my brother's house over the net) You can call me paranoid. All good fun ...... rgds Link to post Share on other sites More sharing options...
Adam Posted May 23, 2016 Share #46 Posted May 23, 2016 That is a lot of redundancies. But storage space is cheap, more question of time and keeping track. I back-up via time machine, and have additional backup on an offline disk - which is a good idea considering the prevalence of ransom-ware. The probability of triple failure is low, unless there is some electromagnetic event in the area. But I do not like dropbox or google drive. I was thinking of renting space on Amazon. Smugmug used to have a service called the vault, but they discontinued it. Still, I plan to upload there my most dear TIFF's into a private gallery. Link to post Share on other sites More sharing options...
menos I M6 Posted May 23, 2016 Share #47 Posted May 23, 2016 Advertisement (gone after registration) Thanks for sharing this info. But May I ask did you compare the level of degradation before and after you took all/some of the measures you are suggesting? I am just wondering what improvement in file degradation are we talking about, both in relative and absolute terms. The improvements in numbers for file degradation before and after implementing backup measures and following guidelines in terms of hard ware and countermeasures are of course immeasurable as I didn't produce studies or statistics over the years. But. (that's a big but in a single sentence with a rather large period ending it followed by a pause for effect, …) I implemented all these measures not as to counteract the effect of file degradation, but as a direct reaction to a row of terminal hard ware failures. These failures included internal HDD crashes on my then MacbookPro 15" (the silver aluminum first generation long time before the unibody came to market). I needed several days to recover and be full up to work again - result: I implemented a proper backup workflow with SuperDuper. A series of external hard drive crashes - lets name some products here: ALL (seriously, this is no joke) ALL of my G-Drive external hdd products (about 6 drives total) have died around the time warranty expired fatally (meaning non recoverably). Result: I started to implement redundant manufacturer products in my backup guidelines - meaning: when I replace two old external drives, I will buy two sets of identical drives from TWO DIFFERENT manufacturers (say: two WD and two Seagate products). I will then run these for a couple of months or years and see how they behave - any malfunctions? Any unexpected degradation in performance? I found out this way that the very best external drive solutions come from: Western Digital, Seagate and to my surprise: Lacie. Drives from these three brands have been thrown, dropped, lugged around the planet in backpacks, office packs, … in daily use, plugged and unplugged several times a day every day for years. I have not a single drive failed from these three brands. If I would not have implemented all the named measures, I would have spend many days worth of recovering whereas with these measures fully implemented I don't break a sweat when something goes wrong (and it will, s it is not a question if but only a question of when and how bad). Regarding my software workflow: I am always interested in fool proofing, streamlining and increasing safety. Only after I found on few occasions damaged, unreadable image files, did I get informed about checksum checking and searched for weeks for a really streamlined implementation of a good workflow - it turns out: THERE IS NONE! Basically image related software houses who sell us image management software and camera makers (who should have a big interest in the topic themselves) seem to almost entirely ignore this issue. The gist is: corruption only happens when something malfunctions, so don't let some of your gear we are not responsible for malfunction - you are on your own. Adobe has a half-baked feature implemented which they only allow to work when you convert every single imported image file into their propriety DNG format (which is NON standard) - hence all bets are off when one will not use this format. Apart from that you have to use commercial checksum software. The easiest to implement in a workflow I found the two solutions I posted earlier - I am sure since I started using them, others have appeared. I hope at some point Adobe will offer a universal solution for the entire image library. Link to post Share on other sites More sharing options...
bencoyote Posted May 23, 2016 Share #48 Posted May 23, 2016 Should it not be the role of the OS to guarantee a bit-by-bit perfect copy? It does, does it not? And regarding the opening of files - a few programs, and more to follow use a known construction. For example, JAVA .jar files and MS Word .DOCX files are ZIP files containing separate components of such things as format, text, images, macros and so-forth. During Microsoft's early use of .DOCX with their bugs I would sometimes decompress them and manually patch the components, re-zip/rename and return the file. There is a lot more of that to be done. As an up to date practioner of the art, I will say the OS does not provide bit-by-bit perfect copies. It is a best effort thing at the level of hardware we are all using. It is mostly reliable and once it has been copied it should remain a perfect copy but digital data does decay and this rate of decay is much higher than most people realize. A file that you haven't accessed in years may suddenly no longer be uncorrupted. I will honestly say that it is the silent undetected corruption of unread files that happens after the fact that probably scares me the most. (Keep in mind my level of fear is fairly low.) There are some enterprise hardware and software stacks that come very close to providing that level of guaranteed protection but they cost well more than a new M-P and a set of all Summilux and Noctilux lenses. :-) That is why I believe that it has to be intrinsic to the file format itself. Be aware that there are differences between things like checksums which can be used to detect when the file has been corrupted or modified and things like block codes which not only provide error detection but in many cases correction. What you describe is more along the lines of integrity verification and corruption detection rather than the kind of intrinsic block codes for error correction that I'm referring to. Link to post Share on other sites More sharing options...
Adam Posted May 23, 2016 Share #49 Posted May 23, 2016 Plus it is inherently impossible to state that a copy is identical since the act of verification might itself impact the copied file. One would need to keep verifying that the verification algorithm did not impact the file. The thing would run a loop, but would still inherently be unable to determine if two files are identical. Most cases, a comparison of random segments is fine. I think when properly backed up, our photos are safer now than when we kept them in boxes. But the ones in boxes are more fun to look at... Link to post Share on other sites More sharing options...
pico Posted May 23, 2016 Share #50 Posted May 23, 2016 Adobe has a half-baked feature implemented which they only allow to work when you convert every single imported image file into their propriety DNG format (which is NON standard) - hence all bets are off when one will not use this format. Adobe developed DNG. How could it not be standard (compliant to their own directives)? Link to post Share on other sites More sharing options...
bencoyote Posted May 23, 2016 Share #51 Posted May 23, 2016 Much of what memos writes below is not entirely accurate Jayant, the corruption that happens most of the time just flips a bit or a few - this is most often too small a corruption to damage an image file so severe as to it not being readable any longer. This means unfortunately that even file management software as Lightroom being able to access and render a preview from such a damaged file is any indicator of a file being safe or not. A file checksum check is the exclusive safe measure to find file corruption on image files. Unfortunately these checks do need time, CPU cycles and heavy drive use (every single file has to be accessed, it's checksum computed and compared to a known checksum), hence time consuming with large archives. The issue here is that one wants to know about file corruption as soon as possible as in many cases corrupted files are a first sign of a deteriorating hard drive, which needs urgent replacement. Other measures to prevent such dramatic failure are: - exclusively using industrial grade hard drives opposed to the cheap drives that are discounted at only a fraction of the price (magnetic hard drives in many old Mac computers interestingly were NOT industrial grade and got immediately replaced upon purchase of such a machine) - SSD drives have similar grades with high end drives having very large parts of their actual drive space reserved for correction - initial heavy stresstest and scan hard drives upon purchase BEFORE using them actively for important data - regularly doing hard drive scans (there are many different software solutions to do that, one of the most used ones is DiskWarrior for Mac (a must have and must regular use software for any Mac user) - replace hard drives on a regular schedule (on my daily worker, always a MacbookPro, I ran replaceable internal magnetic hard drives for max 1-2 years, now with non replaceable SSDs, I do more extensive scans and limit professional use of the machine to max 3 years) - external magnetic drives which are in regular use (archive, data drives) have a max live before replacement of 2 years, after which they go into the backup drive cycle (with extensive stress testing before and during use) - external SSD drives I use exclusively as perishable mediums as of their high risk of total failure (think transport unimportant data from one machine to another but never as backup drive or user drive for important data) - use ECC ram when your computer allows - regularly do the Apple hard ware test routine (which will also test RAM memory banks) - most important: if any drive ever acts up, it will be immediately replaced and destroyed if deemed unsafe (if a stresstest and hard ware scan deems it repairable, it lands in the unsafe for important data junkyard) - I also completely prevent buying from brands again that have had a complete drive failure before (such as some of the fancy design brands, pushed in Apple shops) - I have had exclusively great experience with internal and external drives from OWC, WD and Seagate For all these regular checkups I run repeatable iCal reminders (together with battery refresh schedule reminders for mobile devices, …). I use SuperDuper in combination with saved profiles and scripts with all drives. There really aren't industrial grade drives. There are primary drives, drives that are intended to be used for archival storage, drives that are intended to be used in arrays and several other varieties. These are optimized for different uses but all try generally quality devices. The drives found in macs are standard primary storage drives. All drives have space for remapping data onto different locations. Doing heavy stress tests on new drives accomplishes nothing. That may have been useful 20 years ago but not with modern drives and modern controllers. The drive will always appear to be fine when new and the OS is no longer mapping bad blocks, the drive firmware is. SSDs aren't as diverse as magnetic storage but they do have reserved space to bypass failing blocks. Some things are very different in SSDs though. The firmware reports that it has less storage than it actually does and reserves the rest to bypass malfunctioning blocks. Then it tries to level out the wear of the flash so it is continually writing across all of the available storage. There are two important implications to this: if you have a big card and you store less data, then the flash will get written less often and the cards will wear more slowly. If you fill up a card, you will find that you are more likely to find marginal blocks on a wearing card. Scheduled replacement of drives based upon calendar is unlikely to accomplish much. Too many factors. ECC RAM is a good idea but only if you can have end to end data protection. If your PCI bus doesn't have ECC then there is little benefit. If you are using NAS then your network card must have ECC. If you don't have some sort of block code like ECC built into your file format or the file system it is very hard to detect and correct errors. Your best defense is to use arrays of drives. A simple choice might be Drobo or something like that. (I don't work for them, get any kickback or referral bonuses or anything, and I don't even know anybody who works for the company. I just mention Drobo because I know how to do this right and it costs a fortune and is very complicated to setup. Knowing the range of problems and the technical difficulties, I can't think of anything else SIMPLE enough to mention on a forum.) All drives fail. No brand is much better than any other. What you think you know from last year is probably not correct now. Some vintages are good some not so good. When a drive acts up, yes that is a time to copy data off of it and start using something else. Only nurse things along until you can get the data off of it. Link to post Share on other sites More sharing options...
pico Posted May 23, 2016 Share #52 Posted May 23, 2016 [... snip excellent article ...] When a drive acts up, yes that is a time to copy data off of it and start using something else. Only nurse things along until you can get the data off of it. When a HDD reports a serious error, DO NOT REBOOT. As soon as you can, transfer its contents to a new drive. . Link to post Share on other sites More sharing options...
bencoyote Posted May 24, 2016 Share #53 Posted May 24, 2016 BTW If anyone wants to induce a panic attack by reading up on the subject this is a good place to start http://research.cs.wisc.edu/adsl/Publications/zfs-corruption-fast10.pdf It is kind of old but it introduces the factors that cause the problem and discusses counter measures still being worked on in various parts of the industry. Link to post Share on other sites More sharing options...
jmahto Posted May 24, 2016 Share #54 Posted May 24, 2016 BTW If anyone wants to induce a panic attack by reading up on the subject this is a good place to start http://research.cs.wisc.edu/adsl/Publications/zfs-corruption-fast10.pdf It is kind of old but it introduces the factors that cause the problem and discusses counter measures still being worked on in various parts of the industry. I am saved from panic attack since I got this "URL /adsl/Publications/zfs-corruption-fast10.pdf was not found on this server. Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request." Link to post Share on other sites More sharing options...
jaapv Posted May 24, 2016 Share #55 Posted May 24, 2016 Me too - maybe that is the scary part - the page was killed off by Langoliers Link to post Share on other sites More sharing options...
Adam Posted May 24, 2016 Share #56 Posted May 24, 2016 There was garbage at the end of the link, probably a tracker, or log in stuff, once you remove it it is there. So it was not eaten just obscured. A PDF file did not end in PDF. http://research.cs.wisc.edu/adsl/Publications/zfs-corruption-fast10.pdf Link to post Share on other sites More sharing options...
flyalf Posted May 28, 2016 Share #57 Posted May 28, 2016 Compressed files is tradeoff between less file size and more processing power needed. Not stating this as a pratical con, just a reminder that there still is no free lunch. Btw, I am trusting in archiving files regulary in cloud (As opposed to syncing) will protect from cosmic rays damage. Should I not trust clouds in this respect? Link to post Share on other sites More sharing options...
jmahto Posted May 29, 2016 Share #58 Posted May 29, 2016 Compressed files is tradeoff between less file size and more processing power needed. Not stating this as a pratical con, just a reminder that there still is no free lunch. Btw, I am trusting in archiving files regulary in cloud (As opposed to syncing) will protect from cosmic rays damage. Should I not trust clouds in this respect? I don't trust data in the cloud. What would you do if they simply lose it? Link to post Share on other sites More sharing options...
flyalf Posted May 29, 2016 Share #59 Posted May 29, 2016 I don't trust data in the cloud. What would you do if they simply lose it? Revert to the files on my NAS (RAID 5) Link to post Share on other sites More sharing options...
bencoyote Posted May 31, 2016 Share #60 Posted May 31, 2016 Compressed files is tradeoff between less file size and more processing power needed. Not stating this as a pratical con, just a reminder that there still is no free lunch. Just a note to bring you up to the current state of the art. This was mostly true up to about 10 years ago but it is no longer necessarily always true (in fact it is mostly wrong) but for a counterintuitive reason. There is also a cost in power and time for moving data even just from RAM to the CPU caches. It is becoming increasingly common for this cost to be greater than the cost to compress and decompress the same data. It is just that RAM and storage are so much further away from the processor than the the cache. So statements like yours are probably not currently true. It isn't so much that you're underlying facts are wrong but the world changed underneath you. The end of Dennard Scaling and the rise of mobile smartphones driving innovation in the battery conscious embedded space changed everything. BTW I would almost bet that we'll get about 10% more shots per battery charge by using DNGc vs DNG uncompressed. Anybody tried this? Link to post Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.