Lines Matching +full:system +full:- +full:observe
6 ------------
11 file type requires heavy-duty semantic analysis on the file contents.
15 Previous versions of PKZip and other zip-compatible compression tools
19 limitation of this scheme is the restriction to Latin-based alphabets.
29 a much increased precision and a near-100% recall. This scheme is
30 designed to work on ASCII, Unicode and other ASCII-derived alphabets,
31 and it handles single-byte encodings (ISO-8859, MacRoman, KOI8, etc.)
32 and variable-sized encodings (ISO-2022, UTF-8, etc.). Wider encodings
33 (UCS-2/UTF-16 and UCS-4/UTF-32) are not handled, however.
37 -------------
41 - The white list of textual bytecodes:
43 - The gray list of tolerated bytecodes:
45 - The black list of undesired, non-textual bytecodes:
55 ---------
59 The first observation is that, although the full range of 7-bit codes
62 widely-used, almost universally-portable control codes are 9 (TAB),
72 detection schemes observe the presence of non-ASCII codes from the range
78 used for encoding non-Latin scripts.
84 results on a text encoded, say, using ISO-8859-16 versus UTF-8.)
87 one or more black-listed codes, either by mistake or by peculiar design
89 of black-listed codes would provide an increased recall (i.e. more true
93 be regarded as binary by general-purpose text detection schemes, because
94 general-purpose text processing algorithms might not be applicable.
96 a near-100% recall.
99 and applications. We tried plain text files, system logs, source code,
105 --
107 Last updated: 2006-May-28