ó Û`W[c@ s8ddlmZddlmZddlZddlZddlZddlZddlZdZeed„Z de fd„ƒYZ dd „Z ed „Z ed d d „Zedkr4ejdZejdZejdZejdZdZedededeZgejeƒD]<Zejjejjeeƒƒr%ejjeeƒ^q%ZgeD]Zeekrne^qnZeeƒZdedZ e e ƒZ!dZ"ddddgZ#dZ$gZ%e e"e#ej&ƒƒZ'e e$e%ej&ƒƒZ(e)deƒZ*e*dZ+ee!e'e(ge+ƒe e!e"ƒZ,e e!e$ƒZ-xWe.de*ƒD]FZ/e0e/ƒej1edede2e/ƒdƒZ3e e,e3ƒqcWdede3j4dZ5edkrðej6e5fd ej7ƒZ8n*ed!krej9e5fd ej7ƒZ8ne e-e8ƒe!j:ƒndS("iÿÿÿÿ(tdivision(tprint_functionNt defaultNodecC sCddl}|j|ddddƒ}|j|j||ƒ}|S(s6 toDiskName: the name of the file on disk iÿÿÿÿNtmodetwttitletDataset(ttablestopenFilet createGrouptroot(t toDiskNamet groupNametgroupDescriptionRth5filetgcolumns((s ../data_process/store_pytable.pyt init_h5_file s tInfoToInitArrayOnH5FilecB seZd„ZRS(cC s||_||_||_dS(s name: the name of this matrix shape: tuple indicating the shape of the matrix (similar to numpy shapes) atomicType: one of the pytables atomic types - eg: tables.Float32Atom() or tables.StringAtom(itemsize=length); N(tnametshapet atomicType(tselfRRR((s ../data_process/store_pytable.pyt__init__s  (t__name__t __module__R(((s ../data_process/store_pytable.pyRsiˆcC s‰t|ƒ}t|t|ƒƒ|}x\td||ƒD]H}||krY|||n||}|j|||!ƒtjƒq9WdS(s= Going to write to disk in batches of batch_size iN(tlentinttfloattxrangetappendRtflush(t theH5Columnt whatToWritet batch_sizet data_sizetlasttitstop((s ../data_process/store_pytable.pyt writeToDisk s  cC s|jdtƒ}t||ƒS(Nt/(tgetNodetDEFAULT_NODE_NAMEtgetattr(Rt columnNametnodeNametnode((s ../data_process/store_pytable.pyt getH5column,stbloscic C s’|j|j|ƒ}tjd|d|ƒ}x^|D]V}dg} | j|jƒ|j||jd|jd| d|jd|d|ƒq4Wd S( s h5file: filehandle to the h5file, initialised with init_h5_file infoToInitArrayOnH5File: array of instances of InfoToInitArrayOnH5File expectedRows: this code is set up to work with EArrays, which can be extended after creation. (presumably, if your data is too big to fit in memory, you're going to have to use EArrays to write it in pieces). "sizeEstimate" is the estimated size of the final array; it is used by the compression algorithm and can have a significant impace on performance. nodeName: the name of the node being written to. complib: the docs seem to recommend blosc for compression... complevel: compression level. Not really sure how much of a difference this number makes... tcomplibt complevelitatomRRtfilterst expectedrowsN( R(R RtFilterstextendRt createEArrayRR( RtinfoToInitArraysOnH5Filet expectedRowsR,R0R1RR3tinfoToInitArrayOnH5Filet finalShape((s ../data_process/store_pytable.pytinitColumnsOnH5File1s   t__main__iiiis../data/numpy_arrayst.s0../data/PROSITE_TP_TN/pytables_Voxel/train_data_s .pytablestdataitlabelièiR't_s.dattnegtdtypetpos(;t __future__RRRtnumpytsystostrandomR)RtobjectRR&R.R<Rtargvt target_RESt target_ATOMtsitet pos_or_negt input_dirtIDtlistdirtftpathtisfiletjointfilesttRt total_numtfilename_trainRtdataNamet dataShapet labelNamet labelShapet Float32AtomtdataInfot labelInfotmint num_of_datt numSamplest dataColumnt labelColumntrangetdat_numtprinttloadtstrtXRt actual_sizetzerostfloat32tytonestclose(((s ../data_process/store_pytable.pyts\            R%    )