October, 2007.
Python File Indexer
pyIndex is a python script and a database configuration that allows to index files. It was originally developped for a source code search engine for the SAMATE Reference Dataset.
By the way, the method uses a MySQL database explained bellow for the storage of the words and the references. The script allows you to do a lazy indexing (index only a given directory) or a full directory indexer.
When I say 'directory' I mean ID based directory:
ID = 4242 the directroy is: ./000/004/242/*.*
Installing pyIndex
For installing this script, you only need to have MySQLdb and setup the database which should be like that:-- Words storage database CREATE TABLE `words` ( `WordID` int(10) NOT NULL auto_increment, `Word` text collate latin1_general_ci NOT NULL, PRIMARY KEY (`WordID`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; -- Relation storage database CREATE TABLE `words2id` ( `WordsCrossID` int(10) NOT NULL auto_increment, `WordsID` int(10) NOT NULL, `ID` int(10) NOT NULL, PRIMARY KEY (`WordsCrossID`), UNIQUE KEY `WordsID` (`WordsID`,`ID`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
How do I use pyIndex ?
python pyIndex.py <ID/build/rebuild>
This command adds the values in the databases (words etc.) then, for using the results, you only have to perform a simple SQL query such as:
SELECT t.ID FROM words2id as t, words as w WHERE w.Word LIKE '%SEARCH_WORD%' AND t.ID = w.WordID GROUP BY t.ID
You can change the test of the word such as
w.Word SOUNDS LIKE 'SEARCH_WORD'etc.
Download pyIndex
Download pyIndex
Romain Gaucher -
r@rgaucher.info