Copyright (C) 2006. Oleg Kobchenko. All rights reserved.
XML parser addon based on Expat 2.0.0 library. There is both flat API and object oriented, SAX-like interface. Binaries for Windows, Linux x86 and Darwin PPC included.
SAX parsing works within the push model, i.e. the API calls you. You provide the callback functions by overriding the base class, see saxclass definition. For the XML nodes events, these functions are called on.
A higher-level visitor design pattern can be obtained if you define verbs with names of elements of interest and a prefix and call then from start/endElement. This would be similar to wd calling on event verbs.
In your class you maintain the state and selectively process the events. The event for text between tags is called characters. It is demoed in the table and rss examples.
In rss example, a simple stack of nested elements is maintained in the S list. Then characters processes the text accroding to the current context.
You can pass the result for process in the output of endDocument, which is the last event called.
These are listings and results of some examples found in the test folder.
NB. object oriented sax parser specialization
NB. extended to use attributes and levels
require '~addons/xml/sax.ijs'
saxclass 'psax2'
showattrs=: (''"_)`(;:^:_1@:(([ , '='"_ , ])&.>/"1))@.(*@#)
startDocument=: 3 : 0
L=: 0
)
startElement=: 4 : 0
smoutput (L#' '),'[',y.,' ',(showattrs attributes x.),']'
L=: L+1
)
endElement=: 3 : 0
L=: L-1
smoutput (L#' '),'[/',y.,']'
)
NB. =========================================================
cocurrent 'base'
TEST1=: 0 : 0
<root><test a="11"/><test b="12"/></root>
)
0 : 0 NB. Test
process_psax2_ TEST1
process_psax2_ fread jpath '~addons/xml/test/chess.xml'
)
process_psax2_ TEST1 [root] [test a=11] [/test] [test b=12] [/test] [/root]
NB. object oriented sax parser specialization
NB. extended to use text characters
require '~addons/xml/sax.ijs'
saxclass 'psax3'
showattrs=: (''"_)`(}.@;@:((',' , [ , '='"_ , ])&.>/"1))@.(*@#)
startDocument=: 3 : 0
L=: 0
IGNOREWS=: 1
)
startElement=: 4 : 0
smoutput (L#' '),'',y,'(',(showattrs attributes x),') {'
L=: L+1
)
endElement=: 3 : 0
L=: L-1
smoutput (L#' '),'}'
)
characters=: 3 : 0
smoutput (L#' '),y
)
NB. =========================================================
cocurrent 'base'
TEST3=: 0 : 0
<body><p a="11">s123</p>Between<q b="12" c="3">z456</q></body>
)
0 : 0 NB. Test
process_psax3_ TEST3
process_psax3_ fread jpath '~addons/xml/test/table.xml'
)
body() { p(a=11) { s123 } Between q(b=12,c=3) { z456 } }
NB. using element character content
NB. inter-tag and surrounding whitespace is ignored
require '~addons/xml/sax.ijs format'
saxclass 'ptable'
endElement=: 3 : 0
if. y.-:'tr' do. TD=: '' [ TR=: TR,TD end.
)
characters=: 3 : 'TD=: TD,<y.'
startDocument=: 3 : 'TR=: empty TD=: i.0 [ IGNOREWS=: 1'
endDocument=: 3 : 'TR'
NB. =========================================================
cocurrent 'base'
TEST4=: 0 : 0
<table><tr> <td>0 0 </td> <td> 0 1</td> </tr>
<tr> <td>1 0 </td> <td> 1 1</td> </tr></table>
)
0 : 0 NB. Test
process_ptable_ TEST4
process_ptable_ fread jpath '~addons/xml/test/table.xml'
)
process_ptable_ TEST4 +---+---+ |0 0|0 1| +---+---+ |1 0|1 1| +---+---+
NB. using element character content
NB. selective processing based on element hierarchy position
require '~addons/xml/sax.ijs format'
saxclass 'prss'
startDocument=: 3 : 0
S=: ''
)
startElement=: 4 : 0
S=: S,<y.
if. y.-:'item' do. smoutput '' end.
)
endElement=: 3 : 0
S=: }:S
)
characters=: 3 : 0
s2=. _2{.S
if. s2-:;:'channel title' do. smoutput 'Channel: ',y. elseif.
s2-:;:'channel description' do. smoutput fold y. elseif.
s2-:;:'channel pubDate' do. smoutput 'Date: ',y. elseif.
s2-:;:'item title' do. smoutput 'Topic: ',y. elseif.
s2-:;:'item description' do. smoutput fold y. elseif.
s2-:;:'item link' do. smoutput 'URL: ',y. end.
)
NB. =========================================================
cocurrent 'base'
TEST3=: 0 : 0
<channel><title>qq</title><pubDate>1/1/2006</pubDate></channel>
)
0 : 0 NB. Test
process_prss_ TEST3
process_prss_ fread jpath '~addons/xml/test/cnn.rss'
)
process_prss_ TEST3 Channel: qq Date: 1/1/2006
NB. chess -- a more complete example of custom parser
NB. transforms XML chess board into a J character matrix
require '~addons/xml/sax.ijs viewmat'
saxclass 'pchess'
COLORS=: ;:'whitepieces blackpieces'
PIECES=: ;:'pawn rook night bishop queen king'
SYMBOLS=: 'PRNBQKprnbqk'
startElement=: 4 : 0
e=. <y.
if. 2>C=. COLORS i.e do. COLOR=: C*6 return. end.
if. 6>P=. PIECES i.e do. PIECE=: SYMBOLS{~COLOR+P return. end.
if. -.'position'-:y. do. return. end.
r=. <:0". x.getAttribute 'row'
c=. 'abcdefgh'i.x.getAttribute 'column'
empty BOARD=: PIECE (<r,c) } BOARD
)
startDocument=: 3 : 0
BOARD=: '. '{~ ~:/~2|i.8
)
endDocument=: 3 : 0
|.BOARD
)
NB. =========================================================
cocurrent 'base'
0 : 0 NB. Test
process_pchess_ fread jpath '~addons/xml/test/chess.xml'
viewbmp jpath'~addons/xml/test/chess.bmp'
)
process_pchess_ fread jpath '~addons/xml/test/chess.xml' . . . . q . . . k B . . p . . .P P. p . . .P. . . .P. PP. . . R K
NB. interrupt on found data or error
NB. sax_test2 extended to stop parsing.
NB. Note: end element event is still handled
require '~addons/xml/sax.ijs'
saxclass 'pstop'
showattrs=: (''"_)`(' ' , ;:^:_1@:(([ , '='"_ , ])&.>/"1))@.(*@#)
startDocument=: 3 : 0
L=: 0
V=: 'not found'
)
startElement=: 4 : 0
smoutput (L#' '),'[',y,(showattrs attributes x),']'
if. y-:,'p' do.
select. x getAttribute 'n'
case. ,'b' do. stop '' [ V=: x getAttribute 'v'
case. _1 do. stop 1001;'Attribute "n" missing'
end.
end.
L=: L+1
)
endElement=: 3 : 0
L=: L-1
smoutput (L#' '),'[/',y,']'
)
endDocument=: 3 : 0
smoutput 'Value of n=b is ',":V
)
NB. =========================================================
cocurrent 'base'
TEST4=: 0 : 0
<body><p n="a" v="11"/><p n="b" v="22"/><p n="c" v="33"/></body>
)
TEST4a=: 0 : 0
<body><p n="a" v="11"/><p n="c" v="33"/></body>
)
TEST4b=: 0 : 0
<body><p n="a" v="11"/><p v="22"/><p n="c" v="33"/></body>
)
0 : 0 NB. Test
process_pstop_ TEST4
process_pstop_ TEST4a
process_pstop_ TEST4b
)
process_pstop_ TEST4 [body] [p n=a v=11] [/p] [p n=b v=22] [/p] Value of n=b is 22 process_pstop_ TEST4a [body] [p n=a v=11] [/p] [p n=c v=33] [/p] [/body] Value of n=b is not found process_pstop_ TEST4b [body] [p n=a v=11] [/p] [p v=22] [/p] |xml error 1001 at (1 23): Attribute "n" missing: assert | (assert~error)0