From 6ad5dd777c4023bb2f19586fa30b4768e0ba65fb Mon Sep 17 00:00:00 2001 From: gregor herrmann Date: Sat, 2 Jul 2011 17:23:24 +0000 Subject: [PATCH 1/1] [svn-inject] Installing original source of dupmerge (1.73) --- changelog.txt | 97 +++++ dupmerge | Bin 0 -> 22440 bytes dupmerge.c | 1141 +++++++++++++++++++++++++++++++++++++++++++++++++ readme.txt | 45 ++ 4 files changed, 1283 insertions(+) create mode 100644 changelog.txt create mode 100755 dupmerge create mode 100644 dupmerge.c create mode 100644 readme.txt diff --git a/changelog.txt b/changelog.txt new file mode 100644 index 0000000..395d401 --- /dev/null +++ b/changelog.txt @@ -0,0 +1,97 @@ +changelog.txt from dupmerge +=========================== + +Circa 1993: Initial version, Phil Karn, karn (at) ka9q (dot) net. + + +1998-02-12: Last version from Phil Karn. + + +2004-12-07: Version 1.1 +Added swap macro, invers mode, no replacing of zero length files, void casts +for semantic checkers, version number, sorting of equal size files due to dev +and ino and name, ... +Switched to C99. +Tested with SuSE 9.2 and Debian (both Kernel 2.6), tested with 4 and 7 +Gigabyte files, coreutils sources, md5-collision files, changed to +reading/writing/comparing 8 byte blocks (instead of 1 byte blocks): +2 times faster now. Added Todo-List. +Rolf Freitag, rolf.freitag at email.de + + +2004-12-01: Version 1.21 +Bugfix, because in the old verson 1.1 there are two missing braces. +These missing braces caused that files of same size where +compared only in the first 64 bits so some different files where linked +together from that version! +So all future releases and cvs versions will be tested before release/commit +with the coreutils-5.2.1 sources to assert that in new versions there will be +no (new) errors. +These are the test results from du -sk: +before dupmerge: 31628 +after original version 1.0 from Phil Karn: 29968 +after version 1.1: 29712 +after actual version 1.21: 29968 +Rolf Freitag + + +2005-02-10: Version 1.3 +This version has a help and a fully tested sparse mode which replaces each +file which can be shrinked by sparse copying. The inverse nomal mode now +expands all hard links in O(n) (was O(n*log(n))) and has a new combo mode in +which first all files are replaced by their sparse copy if it is smaller and +after that files of size > 0 with the same content get hard linked to the +oldest file with lowest disk usage (the eldest of the most sparse files). With +the -i option the inverse will be done. This version now "compresses" as much +as theoretical possible by removing redundancy with linking and sparsing. It +saves approx. 20 % of disk space. +Rolf Freitag + + +2005-04-17: Version 1.4 +Added deletion mode (option -d), which deletes multiple files. +This mode can be used e. g. for clearing movie and picture archives. +Rolf Freitag + + +2005-04-27: Version 1.5 +Because the old versions of dupmerge do use fgets to read the filenames, +filenames with newlines can not be read. Therefore fgets is replaced since +version 1.5 by fread and the old sequence operator '\n' is replaced by '\0'; +the input file names must now be separated with zero and not newline. This is +now perfect because a file name is a zero-terminated string. +So instead of +find ./ -type f -print | dupmerge +now you have to call +find ./ -type f -print0 | dupmerge +Bugfix of the wrong version number. +Tested with several files with several newlines in them. +Rolf Freitag + + +2005-08-31: Version 1.6 +Added changelog and several file checks because fread does not distinguish +between end-of-file and error. +Fixed not quite "started" message in quiet mode. +Updated header. +Rolf Freitag + + +2007-10-29: Version 1.7 +Added date and time to startup message. That's important in case of filesystem + errors like directory loops and an infinit looping find. +Added const qualifiers to the two main arguments as minimal write protection. +Added volatile qualifier to the i_exit_flag for safe ipc. +Added s option for doing soft linking instead of hard linking, but a) it needs + several runs to make all links and b) the inverse is not implemented yet. +Added Cygwin support simply via #ifndev __CYGWIN__; it can now be compiled + under Cygwin without modification. +Tested successfull several times with coreutilities and other stuff. +Dr. Rolf Freitag + + +2008-03-01: Version 1.73 +Added some comments, checked with coreutils-5.2.1 (find ./ -type f -print0 | +dupmerge) and my archive (about 500 GB in one million files on an encrypted partition). +Dr. Rolf Freitag + diff --git a/dupmerge b/dupmerge new file mode 100755 index 0000000000000000000000000000000000000000..75d59bd312e9229d5871b56bda19b448a8812288 GIT binary patch literal 22440 zcmeHve|!|>wf7`!Bw%n-3l>}GJKBJxLLdT`peRCuzlcx)rG+#s$%aIeY}}pYM}=!} zcMH>DF})h4x8xS8wYImH($?@&B^n4f0kt-w1r!RgqRy7w8kHi*kGW4HJ@Z8m&Or|daW?~(RB_UH3&@#Prv!XDOUdguW ztveLOI^xM>ItZc->58<=dfY0?ETp5LN9w^X6XjwXOqHOiMB7XnU?#VY`C8PkMIF;n zz)WuaH{aIF0e<3g`QM59RQa;Ni;48el(E%Si^o=%kEyPz4K2?L`tz=_>ma(+-+aeB z;mv-yJYC(*3C$^DB0`!nFnt8_iAZ@!{gH+v@fu)f$zSrh5=lk+da_he#^F8(=?h5z z2MMBx?FJ#g9EsN`B=&zG(#1$$Ldr!NkHkKWMH-1jS$JJ#XP@~i_W8`(z7_0u>U9YK z>WMzYYcLXhC-us5fMLkLjHDrb73m73OOb{keGMrSi5G2h;dM0%E<+lFl#etT=_^PV zULJ>d0dn$w(1Cw~d%`Onc(PN!&Vj3)`e%^ydK-!Ow55A#xS?I#OT*1;4)9Cl#4mIB zk&bX7QdSH6Uf@dkfRoga0UUUY9uSbqS;>u6%w6 zo{gOJmpkB?4I`jn& z{A~wz>1R7I{fYS3J8%Fw;Ti|#7$nTN#7jdWyaO=h#V!2){lq}0zR{`QgM4>Ne>A7Q z+^q*(|7I`!Ih5~7!Ow8;e~p~&J&wfdBP7Ca5y0i@d%A=F4F|s&d8&L{oO)M3f3V4w zKO-l<+mP0N-}TRn6$P=dPOl93yyc99TT&I&eSzuK-eAxdBu;HvD1cky zOjJrsm(=)cOM|*MpqG{^r4?1RRZ1y-bfN}yV&(I;hIiNU?z?u`(h7gzUS*lLO0TOb zXL*^8QF`kvuq&^sEe!>I#AXZ9!%OOL+x2&obOnm+dCAgwS>dU;<)&4T2##a;c>BuCZbi`g8?yX^2Kg=sE&q(6)-o~_RTuM(02e*tam?r=1xk3Y+T~&oCEkpUep_B223E{R~Vr9D^B{6gjpsI9VX?4_Xg% zbT^ZEKg4F{2!7ejF}V$94n9MfW0VhLj>%hNj^UBRJPUI$bM#;|b4-Hcn5&qpnM2@7 z%rRM9&m047DsxPXh0KRwoyHs!+Dzt{z-Kc@ke8uN^|!`+$In^2`S33{}qqjGPXl9+f3mPm{^%e5Y^ z#0UixmAs92C2)=K5N2JxQ{YU(G;zFD;4H$NuHst-&L*r8-X!o~!jN2vHwZkGFeFss z^#TtgJc)3fz#8GHgck{%L--cLvjrYScsAjw0*@viF+(lRuSSM@| z&K7t%;U2<@z$*!#BHVM1{jVpC(U1NM93h-ZxKrS@gtG~^3cQZ+P{LaUZXm1?-X!oN zghvr>5O@ROafIsy-bi>7;W~jg5uQqTk-$$7zJ>5?!1@3E9plaLv7Xs;@2sqU7Bvm{ z`Q}}f$}h2(QPH6j4Gr-1IpJ;(`>ZTj-KO8EDCWaw(YNS6J;S;OT;?y>+twhLJud2} zyo6R1J?r?eaI5F>tpF1HHZwD`qx(X~+cQ|71|l1O^I2@0qOJO+w3~vP0Ue$kmL)n} z#x{H6o5h*o)9LCXyTYeE>Lab0yF%IF(*p#)SYSP?BjW`5gV)}+@M%?lZ*N=dX^eV| z+m4K5sN!u$hLybQcJ7R~J2HBbrBCCIOgb{QC+~LP4qbnsBcsc~18;q#65Ao$49Iua zoYmd*C$lKi3}vA+_nSjwMH|o~#0kYH%*iws=6I}h=xwU*kDk;Qm|u-GP6AUGWv+vA z6iS||&ooOmn4yhk?0TS#Zz5v;`vYDN<(NecW~j$3e#9&}k>8&G;(#M+{YkzYO83l!R_n?%Gw&xbY@3kj%OU!yqjCm1H+SJ-3?LD_B51Px1-Gqh+^A7t;C+^ zFZy@)UGRpL=BmtSLS5g1vdvU>{vq`AD{;cp3y!LhEpA1$RgL}{fH~wW&`x^3oTeX% z6+IH0z$KZe!yI%9HRj;_-u`>6x6YvNY9uU_ov4T3To+#TRGMCz-``azDd<$Ecjg}o zpGq^N%8;t1Uo$6C`H2jU5VL4^nhXy(ZG}1IH55e?`k>Xj*)QXut-a7Dvv{|;YENuN z!kC$pO+^_ia|Xj;S!N>zgFr)D9ZNMqD09#rFru9jr|Eks%qkjOWtcJP)Ac#v>9MW_ zq9{+ZmA4ISpo_I%grC>%$e_B%suAubjGeK;{ljneS0n47FQc%ylq^pWAy7d7bF#w$Xzf;7J+k5>i0pYY>^_Y3?lgde`3 z4SH`cy}qoMR=>C4W6(O7$+-#A8gz zv5tQP3-k-Od17l5=g*)2sPl@Q#tWC4zm7C)1WG$@@|4b>KfmLqO#IGozbQ-n`K?nJ2B#86n;HpVkwk0tdk!unE?~f= ze}aaitwx6$;bg`7xrjf;upH~4UDK{czKWVyN%!%g@S-dZd7`0AA=(en0(y~!vP=f_ z)VjZ+F4`K}gOQbg$ohzw(dFm}dmIO3#srG4MA^kL{_0nljVz1qQ=^=X%_f#c52?|Y z!CQtGhv_=VMkW!e%^qifo)n4ID5q7kg`OIVQf~;Z)U);INs6OJ_8?BA8Ux+Rb~J*~ z$_{iRhS`ltaB~ApGaA6SxsMV0=U^6JOHS}tyERg7Oj54Q=?(2GZNj@qSVzKHjAo7O zThbpqC8%Y%_>_jP2YtL>0yMV%dPgyu6go^+=5Ub{HW@_po##bZs7(=lIl~?VoOKYP zdkLgP>i`6E!{>{C1BYLnaW!a*#8ySE?C5TW`ITTYQCfxiAG7I2J0|Jx0lh602r{oo zaSk-Iob(HGvh22Rw`p&IWY#Rw=CSRiDf4-_d#G6?UZ!X;Z!y*|oMp^3PMy^tH+FsW zM*Wn-2x7`!wP;t{1@q+v^z$;!Oa?JCo%EwYj|Id9?Lq47x<@_qh%C+!F)SqpDL6I( zA%ZsNBDxmmx)(pBYuROb|Gl}LJFWCK&<);d(`S?3A%^C`APWlwZS5%HIvMa`W@>Ow2gL;5Upe5`ROF+&!8BX!O?g3e7dk~GEbh&>6H`jUA1OHjD#bjiLPGMPOt~!ee9U@TS^DM@lU)VfnK`&;JCDT$# zH(8r-0geK5iue@<`;9j1pP<4T4>FiZoQAwtokbfZC&0aNG=w*c*KL*=b)w9uFBmVX zGu|j@Q)jdlykM3*p-y|{ajA|L0=eJ=b@~Tp<1Os8;G3bR3ihvjTz%+9*0u$HSg_w& zuc{A``^rlS9IsN>3ltuRjXXwWH?uT)QjPqAPz&h_T1clp_%VOiJc?hliChbs*n;}t zpYa=i06#DWh!pDw&=^$jT1_-Z|4lACyFTc(+-vWHdIqS?qE2((Yi3MGN5)6Fh0T9u zz)m$j{M~FM_gHZk zh3hj7WpRLJS@%FZ3c=m68sW0dY+`${(7{iTOvqtivc6%HU;^9;^k>O6Hm|E+BUu;+ zuNGG9D$14I58>3D8m*gpD91s%uv?k-K9^OaP+^Nn|Njg6Y_yqC%0L~wLHKp3IFg1E zPIzx|U}Cmsv5Z)?Oqp~3tV;m{ZK7!OcgCcH{nzo;-57dG%?%bug{6K z(Dhvu31(TScMB*oi8ybcc#my$n0pt{}i&iY>EbUHD8|NHvSM+M9*6 zLEIjgL+ss$8W{ok%s~fH7Ts%0Yfh0m*mVPDjm@$kZVi>2Lx^Oy%tjO2w_w%tyI&xL zME9AE>>MY3wQx_&Gaelr26GLDpJtrbIz+V%I z`WAKoL6=*9_E^y%9Vj_9*#n~!L>L%7SR9*C%5rg3lpB2vhCTiwT@5W#OJdlSsZq5M zv@&Aj!#M<+X+b-(qWv{!KmJ3oEokDPP}iS^g`~D2{#WcJ*$?Qv0cf)7A%~DOv>bXs z|9fM=Vc@{FGIe+UH`xA&(z+^-{+y(u6%V|SE_5*j!<5iq^8_zUXEHOaCu`^-IWvg( z*8O$^KSizOX9I8Adk0%zF#JO5%VHqe1XSp&6$q=QnT?Ck?+7 zC~ElkD2;y$KkB+yAxuGxk)c5iGgitH=(Tn#TPY z8=mfNIJmm47{`cUJ-{2N8rVkW?784DRgE;@h}dm_)fDOH zKo=1;Tf|&w8SSjS433G-7`uwCQ6`|+;ERk8js3YtjQ3?rX0TH!Te4$-xh2!NB`d?0 ztfJu`*pH>`zaA&U1F|1`;2+wLC26rAd*gp^Kh{L0r0Nl_0F%-HM~tJ^-G70@eUW>z z@5`QS-xaTq=+%P`}(Wv8=Dcx#%6AwKapmZh^s^tyz!l1NPPn z5xfK~bI5)j4#$Yz1eiAdFS|Ut2>8}=01o|&uqyG7Lh-TU9(DK(Lq5(6s$Bb11(_HnE*;I`iRUJ_6g9FQ zoHp|qu3(pX$F9eDx4Fv(i+2?qeVq4l;PpVJ8modh`K^&wy}+f99s<3w$3D@>?XeFu z%pnrHv34!uoIYH_`XM#48@Tx=_n+GQ~)i;T~lM6~(cJNR+Y zp+mLs1<}cOfrrtR4NCL}2z+HgVNfK`ftW2g2S7;~xV1I{8>@#lHOlh}E^7Xb^jKIh z){)|1E2bKEEA$+cm9W-8mr0MZ1_0u?pNXEW-pycR^vF?-4G75sqlZUdX!R9HZOuR% z_A;0qxCAY;CFp|l@Bv^@L@OJN;};>^8J!;;EXbfC447ny*hY>=)ndsCjv7+fYZ`T3 z73kE5y!b76S$%K;ayfsA&%#|SCQ5~B*p923r_fPoFxwpBu0DPYF(J@Ciu~}q5auw5 zjUIdU&fSwH2fT7@4{bo%RNookQ5!d)H;(b(e)%UkVZHZ`?fv`72D2kt&wP#=W$#Dw z^di|b4y%&$)=)OR4m9Y=njdgl_&$CpHhhN7@)(TMyxkeb)IqfECeZaBex;MkgAO4Z zjn9^O&nW>4}~p0eZOh3HRsHe`^_#J=0u^W(Tfo*lKag|W5xAGV;0515|zs!Gz5M5L zZueZ0=Ej!Lc(cWA(Tq}12$;DOPK_39&L79BL_9k%zOB^vKMO>n;=4NeM@4QvmLt<*hw7Q6d0cC9Gpzlsu3F4-d@CH``>qig~MR%POJgY zoPCfu1g-4u{WSvpks5zcX;g0Rk1}$=v2lVxgX|!oOYV4ZZ0>w{|shDU9N`{ zC9Q`u=WXQKb<~ot;k`rf>|^FiHlr>qEGq`V#%LRpOf(J<16tWMNfn7I&E? z$Bd7S9&9N04A`Yc=pJ|+@z6d%N1Bpwo^)!gTJ%}-uq9*tGuAmC9ozc;t2( zyLSF19mf+!IzU8#wf_OTn8VMq^`k};M_$^2ll}3irwsc~Ag=vB5bQI$G{|o+{%sj} zG9+Q+@su>4ngzjA$AC?qFVt z!Q761Z>CJkp@0 zkntLpPhrnbdcKjSM*a%+@pcrVn|;mzH#>@DxXCh0dLXYH0m53=W#IG-eJuVnkY7kC z^T7p2h^vzH?C*Zq$hLC)N-&TXM7y7LZtyk;;#b4t;4E2AFvJLyeJIS6T+diLhPw$y$+h*PL&z?%trm{P%gx%iD|IQ;g`wqhO5l(dH*`V;Nu(vp0`N zj$q|_Ce4T*M8&@hGaHHUJD#sRKLs=LSu^1I@JTgtIbLw-h<2h5H3&e&qSxF59J-~% zYm{LTtVZTxJcq40;ZxeOKUOp(a-x$uJelnoe}YBQDjJUO6h9#KmYd)FF7m9wTh82w z+;b=YhzQfLVQUs7Z_hYFPVuD)l#5Joc_qFCpgDv@WPK}&$dXmG zD<>76J=K!6JYFN#oW58M#KLi~bqlfNaDiCg?2C1bSXdNTR}yQA+v06~u_o;T7M3^G zS%`0UhP=Feu_DBR!L9wkO6mnK&ZG)&5s7jA*JPFKB42z=>Ef_9V6lrPv65Z%^~HLG zSnQ&YSjjHd_Qg6&EOzlmVkNtHfx4=91B+d}oLFKncCo52uU2BQiy!_8jyvK!4R}}# z*bHy)$l%684EUsgj4A>?CLoWE1Y8TyeZZK*0%3|jJKf$H99dhTe?+t*14kDY5 z3=VKeqwsX<8}Q1G=xeYUOUxm=fx&Px4CGw?PDXvK#X^01l)s8`=J*$IzFmKvqWh*_ z*R&C3`a*LM>DWz5<*J`UfJ$uuug$_3Ui`2y)}MD~7?2QsDSaH%=-&?0dnpbtG!ptz zd(peXQPD6fU{d=4DCThXfzO~32~ik6lrW^odio8l;q^BaWnR0rEw@ZxJCX|^m)~K_3 zjOM2Z20g~B;qwF3S*MJaU$S6Dri~I#pzxs4w3UV5$}(OFpFzPMBla{4go}AAAB~SP zn@I@2(##fVc45v?X>WsIvQ>5FF0+yC0YArVW(CIwOfb%ju?38S(PkrA&p_cYpz#6K zV6$}a>bC0s&@Lyc7=xIZhCYv`4j95Tc#%E8=ot?}!|_MY33D!;>w#vSL!mt)XrfZ0 z&~eS-G96-ryaTHynOUIvNFk;jo)#>5ohURVeSqsKE(Y#nsr?iydKkha>);>rSSM5e z<6J_piXO*CC$p-N1dnr;<9fdiEZJTq8z(-21pv$;Y!oQ#`4UV%Ho^4m3K$t9KlHLCu^lxi~_neZ1xPg3$%V}WIM2KE)b=t4_%YM+WHWcI}ix5U>Y2sZ@}l zk|=Zfhug$XLI~b>qIC5czUZb#7eZRw0qhrNz%9#po+)qI!y`Got@HbV*UX6)YqTgI zRjS)$zKg)1BLO;(RWT(Cz)2NvWBt81xc>R~xAZ~vQ#i*eYNQ@2Pc~UXhjLzWfi-Yy z>be(J4o!4OoEi+vnSpaUYY0L}!9+UW1KrMJ0QS&i~|~5sGGy>o`TcrdOod-eGB`m&^DLT@eW(b==otrMXqHB(kGL> zuKpuJ-d7OUOj!lm6N{P6;jxxl#}hm36s;j30+R$T>xbgh5^m)owRuAl+u72*3#Pzk zgtP7?wuCKPC=Qj*pGpX{hl;%wvR-}F#lvh8PhXy&f6{EB6Phvp&IosSFnpy4PvJR+ z&C(Ez*tGKnJ*!?}0GI-O_3mYU00GQFT#|_MJwrb-hs&E@@@CxjtKe!7v>RT5MMcp8 z$K=m%1EN|uoIV-dYin68JOxc~PUAV3$8pC@>)F?^NZ1ZUb4aR>!T;!z)<#tKE`HCm zlTELoKd_`V8)^!-Fgjpga}Y}@i1H0cWW5U_SKC(useXzC1~u{>m^jjEcTkOv2h@?l znTndWpVi4+We4Nl#Ta$I=&=-?!@AT~={^V|kN6DP|HUsmCk+>&`_^Jxh_BO#PqG(s zHV9UkSD}a|XR8o%K0RwL)*M~voh!Bu3R@P5B%dk)RPd))z*v@dT-43sK$8g}{9d}c z_F=lE%q#gPW8bF4GZ1 z(~reNC`*Z7KSF=Z;dTJq0i#F&Yub)G`@H;ly3cdZ;s+b&qVo)Sb4W7L4qdH4l<0Oy z7F80t)WS~F<`5-EI8C-0BO-kRZZsJE8HV9J^w~DBMYSy6phE>7X)S_zoDP_SiApmvgFgp$n!L}I1ntU#GeuH z|L^1zkDW$~gE;2w=>8=*czpQBl&nVz8OwjgUbJwoi2s536K#BIidFHaDE34fpNbUx zW|~G`dPj%P0L}Upy-9Xd^zI*YqBfvs9gNs` zxs(R-;N>K2iW8SHXs_xC9S!duhKCg;ZRW5XzG5!#Cr!dQNS+s4I3P)^Zu`^M@Uh0m z`Npn^ilQmAn>NmG%J2L+@AU=wt!s{`KUs4G%eh(dWN6J%fJZ;t1*y}G{rY5eOVcs* z)NJ9eajO2M`ql5C4hIEYC1l^Cn zobS7|Q6tJ-F~HdQ=*#{BPJSH`Q8#8xu+EF`>!x_Cmw8tNd*NE&QeQwBySS=$Y+0Q` zQAbo)muq8gu=BDp{>ZB^M73SMEY@dcLn zG;j`9`D?Wi`FYn=3{=X>E30bn3(mO<-z3Y|bbZvqre}q8&0nG6137wl;a~lFFt=O$DQ3x;J8I|fK*#myF|l>fg1SwG{_g!^78VO zK2??8K)J>*`RqE|fMCfYWOjp<-laaYh)?V0)dIs;ru)jZ;Ib+m!q_GzUx-od)7d8V zg|GK~a!C=)DrZ|nDIWCn_ zJX*Lp-lqk8{I>72toRwYwgSX?N7(*QK3WUavNz@QjZjKoium2+WX<`!(Dsr#cnT;a z)#XA4Jrt+~=^gWC&ZN}U@Qa}RA3i6T;&r&Q+qk-KBl4+c-Kes2+FJWfk-dQ&npCG89<7A_+y&eIakPsGegW2+Zng4WDspVoVbN7I0W3`5x+8ttNla3$E(+YhRUSLetW;+%TC3vgrD!q#_ z=J*vZ2T%Y?5%=dev!d5A_o0n%hN@uE8h<&;#?+$V4u82{U~R1Ebd0|i^fm(IsshTb zwUpfF4D-pFJt$$IK1Qjk#aIs1z@{XoOV(A>>~=3;d>FVYfKJ*0syrDSt17fAU4!Bl zS1noM_`sMT+MQADwS!jBU!f;`XH>bZ08%9N0;kH&)5Ht$)iFaZ`T2A&e#*S~;8}tn zBN^Qf&O+f~Ltn4YLu`aby%MhHuk~qx5b}y%e*nK)Uzsl$WPGAc7?7$Q&5W}B)Zv{J z7xM(jLbbXNUo6W&88aufUvp$e5`9h9?)-Fp+1mP~vVzcbwPIflYMg*N7 z1cwpjlaoIfAZ5w;&X27xi0-2?w5vEnBqup-RGlB6Zew7KK(w@{1jHN!6>6RGU}oXC z>*XsSha-4=Ht$@l3&o?HQ8`8Nci~OHiCil_->@7+>Vq zglfc7dNJL}Sro9C>mLR`{5l~S)Yaf!>|IV`fzY#FIsS10QrfTZhx$YcGD&Si{I4)n zSB(&%yTcMQl^fo3KfBfBT7oF-wp>5M_nbi|{U+t!E&G4v?}O&t1$0^e>B;=fmR`Ot zkLOa#e(&5LMovNa5R4DNY^gr~{agc|Yv6MYe6E4dHSoCxQZ;}xt%S0}o-^g%slch3)bnL!{8<65il_>+3M+JrouG-=n28+UDfe*Q!(sROQ4&b&EA+D!gMzFa&8{+#H4TM>D|O8liO-Md)HlZ{TDl84hTU*1jAZXKh0 zmpG98>BGEAFa9`JUipezoPFB49#Ha@)Q0k$Q-nT1De7Qn;@dZM)jAu&U+T)!earDH ze?KfQ;FrH@mFKH0t-wi*&&DbB1_Is{N?w`n4+PPeo!wIgreNr;sVYNbelWpLo^ZCg z;7?Im?eOo`twrGe67Jp0ZTDUjxOkg@ zw+VP#DVTE+k7+5Ai^spAIvW$kjb>+*X9_x%2Zw&O#P*4em8xyXapCM4dwc>H^( z)SG*`{CejJpsI@QRZI6VR`|{<@Xex6FvpKeA+nb1RmR`eEe*N`fJ6# iduh(iACZ%vdlB!clmhqGZPUf8#2 2^31 Byte, you should use the compiler option +"-D_FILE_OFFSET_BITS=64". + +Example for compilation (and striping + prelinking) on i686 ("PC" with Pentium III or higher or Athlon) +with shell function c (in ~/.bashrc): +function c { + gcc -Wall -I. -O3 -D_GNU_SOURCE -D__SMP__ -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_REENTRANT \ + -pthread -mcpu=i686 -fexpensive-optimizations -DCPU=686 -ffast-math -m486 -lm -lz -o $1 $1.c && strip $1 \ + && prelink -fmRv ./$1 && chmod +t $1 +} + +Example usage of this shell function (in ~/.bashrc): c dupmerge + +But a simple "gcc -D_GNU_SOURCE -O2 -o dupmerge dupmerge.c" also does the job. + +The inverse mode expands the hard links to simple files if used without the option -n. +With option -n hard links are only reported. +The reverse mode can be used e. g. for expanding backups, which have been shrinked with +with the default mode. + +Caution: If there is not enough space for expanding the hard links (in inverse mode) hard links +will be lost! It's also a bad idea to kill dupmerge with signal 9 before it finishes because +files can be lost or damaged or new temporary and maybe big files can remain. + +example for usage: +find ./ -type f -print0 | dupmerge 2>&1 | tee /tmp/user0/dupmerge.out + +Todo: + - progress meter with approx. percent + - change from comparing signed to unsigned numbers for clearer ordering + - inverse mode optimization: expand with the same file attributes (copy most of struct stat) + - option m for minimum block size; (not file size because of sparse files). + smaller files will not be hard linked + - ANSI-C (C99) comppatible error handling of the comparison functions + - for option n (nodo): normal mode: show the number of blocks which can be saved, + - refactorisation: separate help function and bit fiels +macros instead of legacy variables + - error message +advice when wrong parameters are used + - option f for using a soft link as second try when a hard link could not be made and the reverse for + inverse mode (inverse mode: expand hard and soft links) + - option z for not excluding zero length files from linking/expanding/erasing + - more user friendly output with e. g. freed block + saved space in kB instead of kiB or blocks (see IEC 60027-2), + extra explanation in output when -n is used, + - tests with ntfs partitions for MS-Win-users + - correkt block counting (sparse files are yet not counted correct) + - man page, info page + - autoconf/automake + rpm and deb package + - An option for scheduling in parallel with 2..N pthreads e. g. for SMP boxes with ramdisk. + - other features like a gziped cvs file with the file attributes from lstat of the files which were + linked + - an extra option for creating the tmpfiles on a specified partition for speedup with a ramdisk + - an option for calling dupmerge with find to shorten the command line the user has to do + - check if the user has enough permissions to replace files (in not nodo mode) + - self-calling, "user-friendly", mode with extra option which does calls dupmerge with find via the system call + - test with files which can be deleted only via the inode (http://www.cyberciti.biz/tips/delete-remove-files-with-inode-number.html) + - An additional option for doing deletion irreversible via shred or wipe or ... instead of unlink or rm. + - Checking for and reporting of filesystems where there are no links possible; e. g. no hard links with FAT. + - GUI + + Version 1.73, 2008-03-01 + + */ + +#include +#include +#include // strndup, +#include +#include +#include // Nessisary for unlink, gtopt, link, fcmp. +#include // for and, bitand, or, bitor ... +#include // _Bool +#include // signals +#include // UCHAR_MAX +#include // mlockall +#include // vfork +#include // vfork +#include // localtime + + +#define mc_MIN(a, b) ((a) < (b) ? (a) : (b)) +#define mc_MAX(a, b) ((a) > (b) ? (a) : (b)) + + +// Swap two items a, b of size i_size, use (static) swap variable c for performance. +#define mc_SWAP_ITEMS(a, b, c, i_size) { memcpy (&(c), &(a), (i_size));\ + memcpy (&(a), &(b), (i_size));\ + memcpy (&(b), &(c), (i_size)); } + +// easter egg +#define mc_ADVICE if (argc > 1 && 0 == strncmp (argv[1], "-advice", 8) ) \ +{ \ + (void)printf ("Don't Panic!\n"); \ + exit (42); \ +} /* 42: The meaning of life, the universe, and everything ;-) */ + + +// For registering the signal handler void sig_handler (int sig). This macro saves 32 bytes ;-) +#define mc_register_sig_handler \ +{\ + int i;\ + for(i=0; i<=UCHAR_MAX; i++)\ + signal (i, sig_handler);\ +} + + +// union for copy and compare +union u_tag +{ + int64_t i64; + char c[8]; +}; + + +int cmp (const void *, const void *); // function for qsort +int cmp0 (const void *, const void *); // function for qsort +int fcmp (const void *, const void *); // function for qsort +int fcmp0 (const void *, const void *); // function for qsort +int fcmp1 (const void *); +int fcmp2 (const void *, const _Bool); +_Bool different_files (const void *, const void *); + +const double d_version = 1.73; // very simple version number of %1.2f format +struct stat s_swap; // swap variable +int Nodo = 0; // for nodo mode (only report) +int Quiet = 0; // for quiet mode +int i_soft = 0; // for using always soft links instead of the default hard links +_Bool b_inv = false; // invers mode indicator +_Bool b_sparse = false; // flag for sparseing +_Bool b_comb = false; // combo flag for normal + sparse mode +int i_minlinks = 1; // minimum of found hard links +int i_maxlinks = 1; // maximum of found hard links +long int li_minblocks = LONG_MAX; // minimum of saved blocks by sparse copying +long int li_maxblocks = 0; // maximum of saved blocks by sparse copying +int Files_deleted = 0; +int Blocks_reclaimed = 0; +int i_files_expanded = 0; +int i_blocks_declaimed = 0; +volatile int i_exit_flag = 0; // flag for exiting soon (not immediately) and controlled after a signal to terminate +char **a_names = NULL; // array of file names +char *a_flags = NULL; // Array with the flags of the files in a_names. 1 = marked for deletion because another file the the same content was found, +int nfiles = 0; // number of files + + +void +sig_handler (const int sig) // signal handler +{ + if ((SIGINT == sig) or (SIGILL == sig) or (SIGKILL == sig) or (SIGSEGV == sig) or (SIGTERM == sig)) + { + i_exit_flag = 1; // set the flag for a graceful exit +// if ((SIGILL == sig) or (SIGKILL == sig) or (SIGSEGV == sig)) +// { +// (void) printf ("\a\a\a\a\a\a\a Signal %d, program exiting... \r\n\a\a\a", sig); +// exit (0); +// } + } + return; +} + + +// initialisation for main +inline void +main_init1 (void) +{ + Files_deleted = 0; + Blocks_reclaimed = 0; + i_files_expanded = 0; + i_blocks_declaimed = 0; + i_minlinks = 1; + i_maxlinks = 1; + li_minblocks = LONG_MAX; + li_maxblocks = 0; +} + + +// delete the files (of name a_name[0 ... i_files] which are marked for deletion +void +delete_marked (const int i_files) +{ + struct stat sa; + int i; + + for (i = 0; i < i_files; i++) + { + if (i_exit_flag) + exit (0); + if (a_flags[i] bitand 1) // delete + { + if (-1 == lstat (a_names[i], &sa)) // or ! S_ISREG (sa.st_mode)) + { + fprintf (stderr, "lstat(%s) failed\n", a_names[i]); + perror ("lstat"); + return; + } + if (0 == Nodo) + { + printf ("deleting %s, freeing %lld blocks\n", a_names[i], ((1 == sa.st_nlink) ? sa.st_blocks : 0)); + unlink (a_names[i]); + } + Files_deleted++; + if (1 == sa.st_nlink) // file a is no (hard) link + { + Blocks_reclaimed += sa.st_blocks; + } + } + } + return; +} + + +int +main (const int argc, const char *const argv[]) +{ + char **names, buf[BUFSIZ]; //, *cp; // BUFSIZE is the default buffer size in stdio.h, usually 8192 + int i, j, i1, i_mode = 0, i_ferror = 0, i_ret = 0; // integers, mode: 0: common, 1: deletion, i_ferror: for ferror + FILE *tmp; // file pointer for tmpfile + const char c_zero = '\0'; // for terminating strings + time_t now; // for actual time + struct tm *ptr_tm = (time (&now), localtime (&now)); // set actual time, convert to localtime + + mc_ADVICE; + // read parameters for quiet mode and suppressing the actual unlinking and relinking. Rolf Freitag + while ((i = getopt (argc, (char *const *) argv, "cdhinqsSV")) != EOF) + { + switch (i) + { + case 'c': + b_comb = 1; + break; + case 'd': + i_mode = 1; // deletion mode + break; + case 'n': + Nodo = 1; + break; + case 'q': + Quiet = 1; + break; + case 'h': + fprintf (stdout, "This program can reclaim disk space by linking identical files together.\n"); + fprintf (stdout, "It can also expand all hard links and reads the list of files from standard input.\n"); + fprintf (stdout, "Example usage: find ./ -type f -print0 | dupmerge 2>&1 | tee ../dupmerge_log.txt\n"); + fprintf (stdout, "Options:\n-h \t Show this help and exit.\n-V \t Show version number and exit.\n"); + fprintf (stdout, "-d \t delete multiple files and hard links. Default: Preserve the alphabetically first file name.\n"); + //fprintf (stdout, " \t Switch for ascessing/decessing order in deletion mode.\n"); + fprintf (stdout, "-q \t Quiet mode.\n-n \t Nodo mode / read-only mode.\n"); + fprintf (stdout, "-i \t Inverse switch: Expand all hard links in normal mode, replace files by their desparsed version if it is bigger.\n"); + fprintf (stdout, "-s \t Flag for soft linking (default is hard linking). This option is beta because for linking of all "); + fprintf (stdout, "equal files more than one run of dupmerge is necessary and the inverse (expanding of soft links) is untested.\n"); + fprintf (stdout, "-S \t Flag for Sparse mode: Replace files by their sparse version if it is smaller.\n"); + fprintf (stdout, "-c \t Combo mode: Default mode +sparse mode. With -i it means inverse mode with unlinking and desparsing.\n"); + fflush (stdout); + exit (0); + break; + case 's': + i_soft = 1; + break; + case 'V': + fprintf (stdout, "dupmerge version %1.2f\n", d_version); + fflush (stdout); + exit (0); + break; + case 'i': + b_inv = true; + break; + case 'S': + b_sparse = true; + break; + default: + break; + } + } + if (not Quiet) + { + (void) printf ("%s started at %d-%s%d-%s%d %s%d:%s%d:%s%d\n", argv[0], + ptr_tm->tm_year + 1900, ptr_tm->tm_mon + 1 > 9 ? "" : "0", ptr_tm->tm_mon + 1, + ptr_tm->tm_mday > 9 ? "" : "0", ptr_tm->tm_mday, + ptr_tm->tm_hour > 9 ? "" : "0", ptr_tm->tm_hour, + ptr_tm->tm_min > 9 ? "" : "0", ptr_tm->tm_min, ptr_tm->tm_sec > 9 ? "" : "0", ptr_tm->tm_sec); + fflush (stdout); + } +#ifndef __CYGWIN__ + mlockall (MCL_CURRENT bitor MCL_FUTURE); // be cached +#endif + // Read list of file names into tmp file and count + tmp = tmpfile (); + if (NULL == tmp) + { + (void) fprintf (stderr, "could not open temporary file, exiting\n"); + exit (-1); + } + nfiles = 0; + if (not Quiet) + { + (void) printf ("tmpfile (pointer %p) created, processing ...\n", tmp); + fflush (stdout); + } + while (!feof (stdin)) // while not end of input + { + buf[0] = '\0'; // set strnlen to 0 so that strnlen (buf, BUFSIZ) is 0 at EOF + for (i = 0; i < BUFSIZ; i++) // read till the sequence operator \0 is found + { + fread (&buf[i], 1, 1, stdin); + i_ferror = ferror (stdin); + if (i_ferror) + { + (void) fprintf (stderr, "stdin: error %d.\n", i_ferror); + i_ret = -1; + } + if ('\0' == buf[i]) // termination found + break; + } + if ('\0' not_eq buf[sizeof (buf) - 1]) // no termination at buffer end + buf[sizeof (buf) - 1] = '\0'; + if (strnlen (buf, BUFSIZ)) + nfiles++; // new file: increment file counter + if ((EOF == fputs (buf, tmp)) or (not fwrite (&c_zero, 1, 1, tmp))) // store new file name in tmp file + { + (void) fprintf (stderr, "could not write to temporary file, exiting\n"); + exit (-1); + } + } + // Now that we know how many there are, printf, allocate space and re-read. + if (not Quiet) + { + (void) printf ("Input: %d files, processing ...\n", nfiles); + fflush (stdout); + } + if (0 == nfiles) // no input, nothing to do + exit (0); + rewind (tmp); + names = (char **) calloc (nfiles, sizeof (char *)); + a_names = (char **) calloc (nfiles, sizeof (char *)); + a_flags = (char *) calloc (nfiles, sizeof (char)); + if ((NULL == names) or (NULL == a_flags) or (NULL == a_names)) + { + (void) fprintf (stderr, "%s: Out of memory\n", argv[0]); + exit (-1); + } + for (i = 0; i < nfiles; i++) // get the file names from the tmp file + { + for (j = 0; j < BUFSIZ; j++) // read till the sequence operator \0 is found + { + fread (&buf[j], 1, 1, tmp); + i_ferror = ferror (stdin); + if (i_ferror) + { + (void) fprintf (stderr, "tmpfile: error %d.\n", i_ferror); + i_ret = -1; + } + if ('\0' == buf[j]) // termination found + break; + } + if ((NULL == (names[i] = strndup (buf, BUFSIZ))) or (NULL == (a_names[i] = strndup (buf, BUFSIZ)))) + { // The command "gcc -o dupmerge dupmerge.c" prduces the Warning "dupmerge.c:191: warning: assignment + // makes pointer from integer without a cast" but this is a warning bug; in the if statement above there is no int! + // To get rid of that warning you can cast strndup to (char *) or equivalent (typeof(names[i])) or use -O3. [gcc version 3.3.4] + // For later versions of gcc you should use the command line option -D_GNU_SOURCE, but that's not necessary under cygwin. + (void) fprintf (stderr, "%s: Out of memory\n", argv[0]); + exit (1); + } + } + (void) fclose (tmp); + mc_register_sig_handler; // register signal handler before critical sections + qsort (a_names, nfiles, sizeof (char *), cmp0); // sort only the (pointers to the) file names in a_names + switch (i_mode) + { + case 0: // no deletion + switch (b_comb and b_inv) // For desparse mode after nomal inverted mode. Otherwise a second run would be nessisary. + { + case 0: // not combo mode and inverse switch + if (b_sparse) // sparse/desparse files first because sparsing/desparsing expands hard links (changes inode) + { + // no sorted list of files at this point + i1 = Quiet; + if (b_comb and Nodo) // combo mode and read-only (Nodo): do nodo sparsing/desparsing during sorting with output here + Quiet = 0; + else + Quiet = 1; + i = Nodo; + Nodo = 1; // for only sorting the names + qsort (names, nfiles, sizeof (char *), fcmp); // sort only the (pointers to the) file names + Nodo = i; // restore Nodo + Quiet = i1; // restore Quiet + foo: + main_init1 (); // start with new init + // sparse or desparse + for (i = 0; i < nfiles; i++) + { + // only different files, call fcmp2/3 only for the last file in a block of equal hard links + if ((nfiles - 1 == i) or different_files (names + i, names + i + 1)) + (void) fcmp2 (names + i, not b_inv); + } + if (not Quiet) + { + (void) + printf + ("Files%s %ssparsed: %d of %d, Disk blocks%s %sclaimed: %d\n", + (Nodo ? " which can be" : ""), (b_inv ? "de" : ""), + (b_inv ? i_files_expanded : Files_deleted), nfiles, + (Nodo ? " can be" : ""), (b_inv ? "de" : "re"), (b_inv ? i_blocks_declaimed : Blocks_reclaimed)); + (void) + printf + ("Minimum of disk blocks which could be %s by %s copying: %ld, Maximum: %ld.\n", + (b_inv ? "declaimed" : "reclaimed"), (b_inv ? "desparse" : "sparse"), li_minblocks, li_maxblocks); + fflush (stdout); + } + if ((b_comb and Nodo) or (b_inv)) // all work done + exit (0); + } // if (b_sparse), no break + case 1: + main_init1 (); // start with new init + if ((b_sparse and b_comb) or (not b_sparse)) // sparse mode +combo flag or no sparse mode: normal mode, maybe with inversion + { // Without the combo flag and with sparse mode there is no normal (maybe inverted) mode! + if (not b_inv) + { + qsort (names, nfiles, sizeof (char *), fcmp); + if (not Quiet) + { + (void) printf ("Scanning for more dups ...\n"); + fflush (stdout); + } + for (i = 0; i <= nfiles - 2; i++) // Second run: nessisary for qsort versions which do use cached values, where fcmp is not always called, + (void) fcmp (names + i, names + i + 1); // e. g. with gcc qsort at coreutil sources. + } + else + { + if (not Quiet) + { + (void) printf ("Scanning for hard links ...\n"); + fflush (stdout); + } + for (i = 0; i < nfiles; i++) + (void) fcmp1 (names + i); + } + if (not Quiet) + { + (void) + printf + ("Files%s %s: %d of %d, Disk blocks%s %sclaimed: %d\n", + (Nodo ? " which can be" : ""), + (b_inv ? "expanded" : "linked"), + (b_inv ? i_files_expanded : Files_deleted), nfiles, + (Nodo ? " which can be" : ""), (b_inv ? "de" : "re"), (b_inv ? i_blocks_declaimed : Blocks_reclaimed)); + (void) printf ("Minimum of found hard links: %d, Maximum: %d.\n", i_minlinks, i_maxlinks); + fflush (stdout); + } + } // if ((b_sparse and b_comb) or (not b_sparse)) + if (b_comb and b_inv) // do desparsing in inverse mode + goto foo; + break; + default: + (void) printf ("Unexpected switch error ...\n"); + exit (-1); + break; + } // switch + break; + case 1: // deletion + // no sorted list of files at this point + qsort (names, nfiles, sizeof (char *), fcmp0); // sort only the file names + // second run + for (i = 0; i <= nfiles - 2; i++) // Second run: nessisary for qsort versions which do use cached values, where fcmp is not always called, + (void) fcmp0 (names + i, names + i + 1); // e. g. with gcc qsort at coreutil sources. + delete_marked (nfiles); + if (not Quiet) + { + (void) + printf + ("Duplicate files (%s deleted): %d of %d, Disk blocks%s reclaimed: %d\n", + (Nodo ? "which can be" : "which have been"), (b_inv ? i_files_expanded : Files_deleted), nfiles, (Nodo ? " which can be" : ""), Blocks_reclaimed); + (void) printf ("Minimum of found hard links: %d, Maximum: %d.\n", i_minlinks, i_maxlinks); + fflush (stdout); + } + break; + default: + (void) printf ("Unexpected switch error ...\n"); + exit (-1); + break; + } // switch + exit (i_ret); +} // main + + +// For alphabetic ordering/searching. +int +cmp (const void *p0, const void *p1) +{ + int i = 0; + char **p2 = (char **) p0; + char **p3 = (char **) p1; + i = strncmp (*p2, *p3, BUFSIZ); // compare the two strings + return (i); +} + + +// For alphabetic ordering/searching. With global inversion flag for inverse mode. +int +cmp0 (const void *p0, const void *p1) +{ + int i = 0; + char **p2 = (char **) p0; + char **p3 = (char **) p1; + i = strncmp (*p2, *p3, BUFSIZ); // compare the two strings + if (b_inv) + return (-i); + return (i); +} + + +// This is the comparison function called by qsort, where the real work +// is done as a side effect. Due to ANSI-C the current version is not +// completely correct because if the same objects are passed more than once to the +// comparison function the results must be consistent with another; see 7.20.5 in C99. +// If errors like lstat failed do occur, this is not fullfilled yet. With fcmp1 it's the same. +// sort levels: 0: size (and accessability), 1: device, 2: content, 3.: name +int +fcmp (const void *a, const void *b) +{ + struct stat sa, sb; + FILE *fa, *fb; + int i1, i2, i_nlinka, i_nlinkb, i_fa, i_fb; + int rval = 0; + const char *filea, *fileb; + union u_tag u_buffer1 = { + 0 + }, u_buffer2 = + { + 0}; + void *buffer1 = (void *) &u_buffer1; + void *buffer2 = (void *) &u_buffer2; + _Bool b_swap = false; + + if (i_exit_flag) + exit (0); + // Nonexistent or non-plain files are less than any other file + if (NULL == a) + return (-1); + filea = *(const char **) a; + if (-1 == lstat (filea, &sa)) // or ! S_ISREG (sa.st_mode)) + { + fprintf (stderr, "lstat(%s) failed\n", filea); + perror ("lstat"); + return (-1); + } + if (NULL == b) + return 1; + fileb = *(const char **) b; + if (-1 == lstat (fileb, &sb)) // or ! S_ISREG (sb.st_mode)) + { + fprintf (stderr, "lstat(%s) failed\n", fileb); + perror ("lstat"); + return (1); + } + i_minlinks = mc_MIN (i_minlinks, mc_MIN (sa.st_nlink, sb.st_nlink)); + i_maxlinks = mc_MAX (i_maxlinks, mc_MAX (sa.st_nlink, sb.st_nlink)); + // Smaller files are "less" + if (sa.st_size < sb.st_size) + return (-1); + if (sa.st_size > sb.st_size) + return (1); + // We now know both files exist, are plain files, are the same size + // if both files are hard linked or zero length: sort by name + if (((sa.st_dev == sb.st_dev) and (sa.st_ino == sb.st_ino)) or (0 == sa.st_size)) + return (strncmp (filea, fileb, BUFSIZ)); + // We now know both files exist, are plain files, are the same size > 0, + // and are not already linked, so compare them alphabetical, if they are on the same device + if (sa.st_dev != sb.st_dev) + return ((sa.st_dev < sb.st_dev) ? -1 : 1); + if (NULL == (fa = fopen (filea, "r"))) + return (-1); // Unreadable files are "less than" + if (NULL == (fb = fopen (fileb, "r"))) + { + fclose (fa); + return (1); + } + // Loop for comparing the files in 64 bit (instead of 8 bit) blocks. + // On big endian machines it's alphabetic ordering, on little andian machines it's + // alphabetic ordering with 64 bit 'characters'. + // Du to C99, 7.19.8.1 read must be done with chars. + while (((i1 = fread (buffer1, 1, 8, fa)) != 0) and ((i2 = fread (buffer2, 1, 8, fb)) != 0)) + { + // Mask the unsused 1..7 bytes, because the standard says nothing about them. + memset (&(u_buffer1.c[i1]), 0, 8 - i1); // start at the first unused byte buffer1.c[i1], end at Byte7 buffer1.c[7] + memset (&(u_buffer2.c[i2]), 0, 8 - i2); + // check for file errors + i_fa = ferror (fa); + i_fb = ferror (fb); + if (i_fa or i_fb) // check for file errors + { + if (i_fa) + { + (void) fprintf (stderr, "file %s: error %d.\n", filea, i_fa); + rval = -1; // error file: smaller + break; + } + if (i_fb) + { + (void) fprintf (stderr, "file %s: error %d.\n", fileb, i_fb); + rval = 1; + break; + } + } + if (u_buffer1.i64 != u_buffer2.i64) // compare + { + rval = (u_buffer1.i64 < u_buffer2.i64) ? -1 : 1; + break; + } + } + (void) fclose (fa); + (void) fclose (fb); + if (rval) // unequal files of same size + return rval; + + if ((sa.st_blocks) or (sb.st_blocks)) // avoid linking sparse files with no blocks allocated + { + // The two files have same content, size > 0, blocks allocated > 0 and are on the same device, so link them. + // We prefer to keep the one with less blocks, or if they're the + // same the older one, or if they're the same, the one with more (hard) links. + if ((sb.st_blocks < sa.st_blocks) or (sb.st_mtime > sa.st_mtime) or (sb.st_nlink < sa.st_nlink)) + { + mc_SWAP_ITEMS (sa, sb, s_swap, sizeof (struct stat)); // swap items to keep original sb + mc_SWAP_ITEMS (filea, fileb, s_swap, sizeof (char *)); // swap file name pointers + b_swap = true; + } + // before linking: store number of hard links of each file + i_nlinka = sa.st_nlink; + i_nlinkb = sb.st_nlink; + if (1 == sa.st_nlink) // file a is no (hard) link + { + Files_deleted++; + Blocks_reclaimed += sa.st_blocks; + } + if (!Nodo and (unlink (filea))) + { + (void) fprintf (stderr, "unlink(%s) failed\n", filea); + perror ("unlink"); + return (-1); + } + if (!i_soft) + { + if (!Nodo and (-1 == link (fileb, filea))) + { + (void) fprintf (stderr, "link(%s,%s) failed\n", fileb, filea); + perror ("link"); + return (-1); + } + } + else + { + if (!Nodo and (-1 == symlink (fileb, filea))) + { + (void) fprintf (stderr, "symlink(%s,%s) failed\n", filea, fileb); + perror ("symlink"); + return (-1); + } + } + if (!Quiet) + { + (void) printf ("ln %s %s %s: %d, %d -> %d, freed +%llu blocks\n", i_soft ? "-s" : "", fileb, filea, i_nlinkb, i_nlinka, i_nlinka + i_nlinkb, + sb.st_blocks); + fflush (stdout); + } + } + return ((b_swap) ? strncmp (fileb, filea, BUFSIZ) : strncmp (filea, fileb, BUFSIZ)); // +} // fcmp + + +// return the index of file with name a_name +int +nindex (const char *a_name) +{ + void *p_v = bsearch (&a_name, a_names, nfiles, sizeof (char *), cmp0); + int i_ret = (int) (((((void *) p_v) - ((void *) a_names))) / (sizeof (char *))); + if (NULL == p_v) + { + printf ("Error: bsearch(%s, %p, %d, %d, %p) returned NULL.\n", a_name, a_names, nfiles, sizeof (char *), cmp); + return (-1); // error + } + return (i_ret); +} + + +// This is the comparison function called by qsort, for marking all duplicate files with that flag +// Due to ANSI-C the current version is not +// completely correct because if the same objects are passed more than one to the +// comparison function the results must be consistent with another, 7.20.5 C99. +// If errors like lstat failed do occur, this is not fullfilled yet. +// sort levels: 0: size (and accessability), 1: device, 2: content, 3.: name +int +fcmp0 (const void *a, const void *b) +{ + struct stat sa, sb; + FILE *fa, *fb; + int i1, i2; + int rval = 0, ja = 0, jb = 0, i_fa, i_fb; + const char *filea, *fileb; + union u_tag u_buffer1 = { + 0 + }, u_buffer2 = + { + 0}; + void *buffer1 = (void *) &u_buffer1; + void *buffer2 = (void *) &u_buffer2; + + if (i_exit_flag) + exit (0); + // Nonexistent or non-plain files are less than any other file + if (NULL == a) + return (-1); + filea = *(const char **) a; + if (-1 == lstat (filea, &sa)) // or ! S_ISREG (sa.st_mode)) + { + fprintf (stderr, "lstat(%s) failed\n", filea); + perror ("lstat"); + return (-1); + } + if (NULL == b) + return 1; + fileb = *(const char **) b; + if (-1 == lstat (fileb, &sb)) // or ! S_ISREG (sb.st_mode)) + { + fprintf (stderr, "lstat(%s) failed\n", fileb); + perror ("lstat"); + return (1); + } + i_minlinks = mc_MIN (i_minlinks, mc_MIN (sa.st_nlink, sb.st_nlink)); + i_maxlinks = mc_MAX (i_maxlinks, mc_MAX (sa.st_nlink, sb.st_nlink)); + // Smaller files are "less" + if (sa.st_size < sb.st_size) + return (-1); + if (sa.st_size > sb.st_size) + return (1); + // We now know both files exist, are plain files, are the same size + // if both files are hard linked: sort by name + if (((sa.st_dev == sb.st_dev) and (sa.st_ino == sb.st_ino))) + { + out_equal: + // set ja, jb to the index of a, b + ja = nindex (filea); + jb = nindex (fileb); + if ((ja >= 0) and (a_flags[ja] bitand 1)) + return (-1); // marked file first + if ((jb >= 0) and (a_flags[jb] bitand 1)) + return (1); // marked file first + // now both files are not marked + if ((ja >= 0) and (jb >= 0)) + { + if ((b_inv ? -1 : 1) * strncmp (filea, fileb, BUFSIZ) <= 0) // alphabetic ordering + { + a_flags[ja] = 1; + if (!Quiet) + { + (void) printf ("Equal files: %s will get deleted, %s will get preserved.\n", filea, fileb); + fflush (stdout); + } + return (-1); // marked file first + } + else + { + a_flags[jb] = 1; + if (!Quiet) + { + (void) printf ("Equal files: %s will get deleted, %s will get preserved.\n", fileb, filea); + fflush (stdout); + } + return (1); // marked file first + } + } + return ((b_inv ? -1 : 1) * strncmp (filea, fileb, BUFSIZ)); + } + // We now know both files exist, are plain files, are the same size > 0, + // and are not already linked, so compare them alphabetical + if (NULL == (fa = fopen (filea, "r"))) + return (-1); // Unreadable files are "less than" + if (NULL == (fb = fopen (fileb, "r"))) + { + fclose (fa); + return 1; + } + // Loop for comparing the files in 64 bit (instead of 8 bit) blocks. + // On big endian machines it's alphabetic ordering, on little andian machines it's + // alphabetic ordering with 64 bit 'characters'. + while (((i1 = fread (buffer1, 1, 8, fa)) != 0) and ((i2 = fread (buffer2, 1, 8, fb)) != 0)) + { + // Mask the unsused 1..7 bytes, because the standard says nothing about them. + memset (&(u_buffer1.c[i1]), 0, 8 - i1); // start at the first unused byte buffer1.c[i1], end at Byte7 buffer1.c[7] + memset (&(u_buffer2.c[i2]), 0, 8 - i2); + // check for file errors + i_fa = ferror (fa); + i_fb = ferror (fb); + if (i_fa or i_fb) // check for file errors + { + if (i_fa) + { + (void) fprintf (stderr, "file %s: error %d.\n", filea, i_fa); + rval = -1; // error file: smaller + break; + } + if (i_fb) + { + (void) fprintf (stderr, "file %s: error %d.\n", fileb, i_fb); + rval = 1; + break; + } + } + if (u_buffer1.i64 != u_buffer2.i64) // compare + { + rval = (u_buffer1.i64 < u_buffer2.i64) ? -1 : 1; + break; + } + } + (void) fclose (fa); + (void) fclose (fb); + if (rval) // unequal files of same size + return rval; + // else + goto out_equal; +} // fcmp0 + + +// This is the hard link expansion function for the inverse mode. Needs only one file name as argument. +int +fcmp1 (const void *a) +{ + struct stat sa; + FILE *fa, *fb; + int i1, i, i_b, i_fa, i_fb, i_ret = 0; + char a_cd[BUFSIZ] = { '\0' }; // for directory + char a_cf[BUFSIZ] = { '\0' }; // for tempfile + const char *filea, *fileb = a_cf; // file names, fileb after mkstemp + union u_tag u_buffer1 = { + 0 + }; + void *buffer1 = (void *) &u_buffer1; + + if (i_exit_flag) + exit (0); + // Nonexistent or non-plain files are less than any other file + if (NULL == a) + return (-1); + filea = *(const char **) a; + if (-1 == lstat (filea, &sa)) // or ! S_ISREG (sa.st_mode)) + { + fprintf (stderr, "lstat(%s) failed\n", filea); + perror ("lstat"); + return (-1); + } + i_minlinks = mc_MIN (i_minlinks, sa.st_nlink); + i_maxlinks = mc_MAX (i_maxlinks, sa.st_nlink); + // check for hard links + if (sa.st_nlink > 1) // hard link found + { + i_files_expanded++; + i_blocks_declaimed += sa.st_blocks; + if (!Nodo) // expand the hard link + { + strncpy (a_cd, filea, BUFSIZ); + // extract the directory from the file name: delete from end to last "/" + for (i = strnlen (a_cd, BUFSIZ); i >= 0; i--) + { + if (a_cd[i] != '/') + a_cd[i] = '\0'; + else + break; + } + // now the actual directory is in a_c, create temporary file there + strncpy (a_cf, a_cd, BUFSIZ); + strncat (a_cf, "dup__XXXXXXX", BUFSIZ / 2); // template for mkstemp + i_b = mkstemp (a_cf); + if (-1 == i_b) + { + (void) fprintf (stderr, "Could not create temporary file.\n"); + perror ("file_create"); + return (-1); + } + if ((NULL == (fa = fopen (filea, "r"))) or (NULL == (fb = fdopen (i_b, "w")))) // open file and tmpfile + { + (void) fprintf (stderr, "Expanding the hard link(%s) failed.\n", filea); + perror ("expand"); + return (-1); + } + // expand to tmpfile + while ((i1 = fread (buffer1, 1, 8, fa)) != 0) // copy fa to fb in 8 byte blocks + { + (void) fwrite (buffer1, 1, i1, fb); + } + i_fa = ferror (fa); + i_fb = ferror (fb); + if (i_fa or i_fb) // check for file errors + { + if (i_fa) + (void) fprintf (stderr, "file %s: error %d.\n", filea, i_fa); + if (i_fb) + (void) fprintf (stderr, "file %s: error %d.\n", fileb, i_fb); + i_ret = -1; + } + (void) fclose (fa); + (void) fclose (fb); + // unlink + if (unlink (filea)) + { + (void) fprintf (stderr, "unlink(%s) failed, tempfile %s remains\n", filea, fileb); + perror ("unlink"); + return (-1); + } + // rename to original name + if (rename (fileb, filea)) + { + (void) fprintf (stderr, "rename(%s, %s) failed, tempfile %s remains\n", fileb, filea, fileb); + perror ("unlink"); + return (-1); + } + } // if (!Nodo) + if (!Quiet) + { + (void) printf ("expand %s: %d -> %d, unfreed %llu blocks\n", filea, sa.st_nlink, sa.st_nlink - 1, sa.st_blocks); // sa has the old cached value + fflush (stdout); + } + } // if (sa.st_ino > 1) + return (i_ret); +} // fcmp1 + + +// This is the (de)sparsing function. +// Copying sparse/desparse can reduce/increase st_blocks while st_size remains constant. +int +fcmp2 (const void *a, const _Bool b_S) +{ + FILE *fb; + struct stat sa, sb; + int i, i_b, status = 0, i_pid, i_ret = 0; + long int li; + char a_cd[BUFSIZ] = { '\0' }; // for directory, temporary string + char a_cf[BUFSIZ] = { '\0' }; // for tempfile + char a_ca[BUFSIZ] = { '\0' }; // for tempfile + const char *filea = a_ca, *fileb = a_cf; // file names + + if (i_exit_flag) + exit (0); + // Nonexistent or non-plain files are less than any other file + if (NULL == a) + return (-1); + strncpy (a_ca, *(const char **) a, BUFSIZ - 1); + // filea = *(const char **) a; + if (-1 == lstat (filea, &sa)) // or ! S_ISREG (sa.st_mode)) + { + fprintf (stderr, "lstat(%s) failed\n", filea); + perror ("lstat"); + return (-1); + } + // get tmpname: mkstemp, unlink + strncpy (a_cd, filea, BUFSIZ); + // extract the directory from the file name: delete from end to last "/" + for (i = strnlen (a_cd, BUFSIZ); i >= 0; i--) + { + if (a_cd[i] != '/') + a_cd[i] = '\0'; + else + break; + } + // now the actual directory is in a_c, create temporary file there + strncpy (a_cf, a_cd, BUFSIZ); + strncat (a_cf, "dup__XXXXXXX", BUFSIZ / 2); // template for mkstemp + i_b = mkstemp (a_cf); + if (-1 == i_b) + { + (void) fprintf (stderr, "Could not create temporary file.\n"); + perror ("mkstemp"); + return (-1); + } + // dummy open and close to close the tmpfile + if (NULL == (fb = fdopen (i_b, "w"))) // open tmpfile + { + (void) fprintf (stderr, "fdopen (%d) failed.\n", i_b); + perror ("fdopen"); + return (-1); + } + fclose (fb); + if (unlink (fileb)) // delete file because we do need only the name + { + (void) fprintf (stderr, "unlink(%s) failed\n", fileb); + perror ("unlink"); + return (-1); + } + switch (i_pid = vfork ()) + { + case -1: // parent with fork error + perror ("vfork"); + fprintf (stderr, "vfork failed.\n"); + break; + case 0: // child + memset (a_cd, '\0', sizeof (a_cd)); + strncpy (a_cd, (b_S ? "--sparse=always" : "--sparse=never"), BUFSIZ); + i = execl ("/bin/cp", "cp", a_cd, filea, fileb, NULL); + if (i) + { + fprintf (stderr, "execlp(cp %s %s %s) failed, return value %d\n", a_cd, filea, fileb, i); + perror ("fork"); + } + _exit (i); + break; + default: // parent with no error: do proceed + break; + } + i = waitpid (i_pid, &status, WUNTRACED); // wait till child terminates + if (status) + { + fprintf (stderr, "copying failed; child with pid %d returned %d\n", i, status); + i_ret = -1; + goto end; + } + // lstat and check copy + if (-1 == lstat (fileb, &sb)) // or ! S_ISREG (sb.st_mode)) + { + fprintf (stderr, "lstat(%s) failed\n", fileb); + perror ("lstat"); + return (-1); + } + li = sa.st_blocks - sb.st_blocks; + if (not b_S) + li = -li; + li_minblocks = mc_MIN (li_minblocks, li); + li_maxblocks = mc_MAX (li_maxblocks, li); + // if shrinked in sparse mode or expanded in desparse mode: unlink original and rename tmpfile to original name + if ((b_S ? (sa.st_blocks > sb.st_blocks) : (sa.st_blocks < sb.st_blocks))) + { + b_S ? Files_deleted++ : i_files_expanded++; + b_S ? (Blocks_reclaimed += sa.st_blocks - sb.st_blocks) : (i_blocks_declaimed += sb.st_blocks - sa.st_blocks); + if (!Quiet) + { + (void) printf ("%s: %lld -> %lld blocks\n", filea, sa.st_blocks, sb.st_blocks); + fflush (stdout); + } + if (not Nodo) + { + if (unlink (filea)) // delete original + { + (void) fprintf (stderr, "unlink(%s) failed\n", filea); + perror ("unlink"); + return (-1); + } + // rename to original name + if (rename (fileb, filea)) + { + (void) fprintf (stderr, "rename(%s, %s) failed\n", fileb, filea); + perror ("unlink"); + i_ret = -1; + goto end; + } + return (0); + } + } +end: + if (unlink (fileb)) // delete tmpfile + { + (void) fprintf (stderr, "unlink(%s) failed\n", fileb); + perror ("unlink"); + return (-1); + } + return (i_ret); +} // fcmp2 + + +// return true if the two files are not hard linkded +_Bool +different_files (const void *a, const void *b) +{ + struct stat sa, sb; + const char *filea, *fileb; + + // Nonexistent or non-plain files are less than any other file + if (NULL == a) + return (true); + filea = *(const char **) a; + if (-1 == lstat (filea, &sa)) // or ! S_ISREG (sa.st_mode)) + { + fprintf (stderr, "lstat(%s) failed\n", filea); + perror ("lstat"); + return (true); + } + if (NULL == b) + return (true); + fileb = *(const char **) b; + if (-1 == lstat (fileb, &sb)) // or ! S_ISREG (sb.st_mode)) + { + fprintf (stderr, "lstat(%s) failed\n", fileb); + perror ("lstat"); + return (true); + } + if ((sa.st_dev == sb.st_dev) and (sa.st_ino == sb.st_ino)) // same device and same inode? + return (false); // yes: equal files + return (true); +} // different_files diff --git a/readme.txt b/readme.txt new file mode 100644 index 0000000..a7ad5d3 --- /dev/null +++ b/readme.txt @@ -0,0 +1,45 @@ +dupmerge overview +================= + +Dupmerge reads a list of files from standard input (eg., as produced by +"find . -print") and looks securely for identical files. When it finds +two or more identical files, all but one are unlinked to reclaim the +disk space and recreated as hard links to the remaining copy. + +Remarks: +dumpmerge should be used only for backups or archives, where duplicate +files are not needed; it should not be used without nodo mode for /home, +/tmp, /var and most other directories. +The normal mode, hard linking of multiple files, causes no problems in backups +or archives and can also be used on CDs/DVDs. On filesystems without hard +links, e. g. FAT (FAT12, FAT16, FAT32, VFAT ...), it can work only with soft +links (often called shortcuts). +The sparse mode never causes problems (on file systems which support sparse). +The deletion mode can cause trouble e. g. with ebooks or html documents with +pictures which are multiple. Therefore the deletion mode should only be used +with files which are not assoziated, e. g. audio or video files. The deletion +mode works on all (writable) file systems. + +Normal mode: Saves approx. 20 % space. + +Sparse mode: Saves approx. 0.2 % space. + +Deletion mode: Deletes approx. 10 % of the files. + +Many similar programs can be found on freshmeat.net or sourceforge.net by +searching for duplicate. +I found clink, dmerge, duff, Dupseek, epac, fdf, fdfind, fdupe, fdupes, +find_duplicates, freedup, freedups, fslint, ftwin, highlnk, WeedIt, and whatpix. + +Most of these programs are not secure: highlnk and FSlint do use md5sum +which is a cryptografical weak hash and therefore they are vunerable to md5sum +collsions. With the hashing they are fast (O(n)) but not safe. +Another point is handling files as zero-terminated strings to avoid problems +with stray filenames, which is done correct from dupmerge. + +If you want to delete all hard links (regular files with more than one hard +link), you only have to type +find . -type f -links +1 -exec rm -- {} \; + + +RF, 2007-10-29 -- 2.20.1