XML Addon for J

Copyright (C) 2006. Oleg Kobchenko. All rights reserved.

XML parser addon based on Expat 2.0.0 library. There is both flat API and object oriented, SAX-like interface. Binaries for Windows, Linux x86 and Darwin PPC included.

SAX parsing works within the push model, i.e. the API calls you. You provide the callback functions by overriding the base class, see saxclass definition. For the XML nodes events, these functions are called on.

A higher-level visitor design pattern can be obtained if you define verbs with names of elements of interest and a prefix and call then from start/endElement. This would be similar to wd calling on event verbs.

In your class you maintain the state and selectively process the events. The event for text between tags is called characters. It is demoed in the table and rss examples.

In rss example, a simple stack of nested elements is maintained in the S list. Then characters processes the text accroding to the current context.

You can pass the result for process in the output of endDocument, which is the last event called.

Examples

These are listings and results of some examples found in the test folder.

sax_test2.ijs

NB. object oriented sax parser specialization
NB. extended to use attributes and levels

require '~addons/xml/sax.ijs'

saxclass 'psax2'

showattrs=: (''"_)`(;:^:_1@:(([ , '='"_ , ])&.>/"1))@.(*@#)

startDocument=: 3 : 0
  L=: 0
)

startElement=: 4 : 0
  smoutput (L#'  '),'[',y.,' ',(showattrs attributes x.),']'
  L=: L+1
)

endElement=: 3 : 0
  L=: L-1
  smoutput (L#'  '),'[/',y.,']'
)

NB. =========================================================
cocurrent 'base'

TEST1=: 0 : 0
<root><test a="11"/><test b="12"/></root>
)

0 : 0  NB. Test
process_psax2_ TEST1
process_psax2_ fread jpath '~addons/xml/test/chess.xml'
)
   process_psax2_ TEST1
[root]
  [test a=11]
  [/test]
  [test b=12]
  [/test]
[/root]

sax_test3.ijs

NB. object oriented sax parser specialization
NB. extended to use text characters

require '~addons/xml/sax.ijs'

saxclass 'psax3'

showattrs=: (''"_)`(}.@;@:((',' , [ , '='"_ , ])&.>/"1))@.(*@#)

startDocument=: 3 : 0
  L=: 0
  IGNOREWS=: 1
)

startElement=: 4 : 0
  smoutput (L#'  '),'',y,'(',(showattrs attributes x),') {'
  L=: L+1
)

endElement=: 3 : 0
  L=: L-1
  smoutput (L#'  '),'}'
)

characters=: 3 : 0
  smoutput (L#'  '),y
)

NB. =========================================================
cocurrent 'base'

TEST3=: 0 : 0
<body><p a="11">s123</p>Between<q b="12" c="3">z456</q></body>
)

0 : 0  NB. Test
process_psax3_ TEST3
process_psax3_ fread jpath '~addons/xml/test/table.xml'
)

body() {
  p(a=11) {
    s123
  }
  Between
  q(b=12,c=3) {
    z456
  }
}

table.ijs

NB. using element character content
NB. inter-tag and surrounding whitespace is ignored

require '~addons/xml/sax.ijs format'

saxclass 'ptable'

endElement=: 3 : 0
  if. y.-:'tr' do. TD=: '' [ TR=: TR,TD end.
)

characters=: 3 : 'TD=: TD,<y.'

startDocument=: 3 : 'TR=: empty TD=: i.0 [ IGNOREWS=: 1'
endDocument=: 3 : 'TR'

NB. =========================================================
cocurrent 'base'

TEST4=: 0 : 0
<table><tr>  <td>0 0 </td>  <td> 0 1</td>  </tr>
      <tr>   <td>1 0 </td>  <td> 1 1</td>  </tr></table>
)

0 : 0  NB. Test
process_ptable_ TEST4
process_ptable_ fread jpath '~addons/xml/test/table.xml'
)
   process_ptable_ TEST4
+---+---+
|0 0|0 1|
+---+---+
|1 0|1 1|
+---+---+

rss.ijs

NB. using element character content
NB. selective processing based on element hierarchy position

require '~addons/xml/sax.ijs format'

saxclass 'prss'

startDocument=: 3 : 0
  S=: ''
)

startElement=: 4 : 0
  S=: S,<y.
  if. y.-:'item' do. smoutput '' end.
)

endElement=: 3 : 0
  S=: }:S
)

characters=: 3 : 0
  s2=. _2{.S
  if. s2-:;:'channel title'       do. smoutput 'Channel: ',y. elseif.
      s2-:;:'channel description' do. smoutput fold y. elseif.
      s2-:;:'channel pubDate'     do. smoutput 'Date: ',y. elseif.
      s2-:;:'item title'          do. smoutput 'Topic: ',y. elseif.
      s2-:;:'item description'    do. smoutput fold y. elseif.
      s2-:;:'item link'           do. smoutput 'URL: ',y. end.
)

NB. =========================================================
cocurrent 'base'

TEST3=: 0 : 0
<channel><title>qq</title><pubDate>1/1/2006</pubDate></channel>
)

0 : 0  NB. Test
process_prss_ TEST3
process_prss_ fread jpath '~addons/xml/test/cnn.rss'
)
   process_prss_ TEST3
Channel: qq
Date: 1/1/2006

chess.ijs

NB. chess -- a more complete example of custom parser
NB. transforms XML chess board into a J character matrix

require '~addons/xml/sax.ijs viewmat'

saxclass 'pchess'

COLORS=: ;:'whitepieces blackpieces'
PIECES=: ;:'pawn rook night bishop queen king'
SYMBOLS=: 'PRNBQKprnbqk'

startElement=: 4 : 0
  e=. <y.
  if. 2>C=. COLORS i.e do. COLOR=: C*6 return. end.
  if. 6>P=. PIECES i.e do. PIECE=: SYMBOLS{~COLOR+P return. end.
  if. -.'position'-:y. do. return. end.

  r=. <:0".       x.getAttribute 'row'
  c=. 'abcdefgh'i.x.getAttribute 'column'
  empty BOARD=: PIECE (<r,c) } BOARD
)

startDocument=: 3 : 0
  BOARD=: '. '{~ ~:/~2|i.8
)

endDocument=: 3 : 0
  |.BOARD
)

NB. =========================================================
cocurrent 'base'

0 : 0  NB. Test
process_pchess_ fread jpath '~addons/xml/test/chess.xml'
viewbmp jpath'~addons/xml/test/chess.bmp'
)
   process_pchess_ fread jpath '~addons/xml/test/chess.xml'
 . . . .
q . . . 
 k B . .
p . . .P
P. p . .
.P. . . 
 .P. PP.
. . R K 

stop.ijs

NB. interrupt on found data or error
NB. sax_test2 extended to stop parsing.
NB. Note: end element event is still handled

require '~addons/xml/sax.ijs'

saxclass 'pstop'

showattrs=: (''"_)`(' ' , ;:^:_1@:(([ , '='"_ , ])&.>/"1))@.(*@#)

startDocument=: 3 : 0
  L=: 0
  V=: 'not found'
)

startElement=: 4 : 0
  smoutput (L#'  '),'[',y,(showattrs attributes x),']'
  if. y-:,'p' do.
    select. x getAttribute 'n'
    case. ,'b' do. stop '' [ V=: x getAttribute 'v'
    case. _1   do. stop 1001;'Attribute "n" missing'
    end.
  end.
  L=: L+1
)

endElement=: 3 : 0
  L=: L-1
  smoutput (L#'  '),'[/',y,']'
)

endDocument=: 3 : 0
  smoutput 'Value of n=b is ',":V
)

NB. =========================================================
cocurrent 'base'

TEST4=: 0 : 0
<body><p n="a" v="11"/><p n="b" v="22"/><p n="c" v="33"/></body>
)
TEST4a=: 0 : 0
<body><p n="a" v="11"/><p n="c" v="33"/></body>
)
TEST4b=: 0 : 0
<body><p n="a" v="11"/><p v="22"/><p n="c" v="33"/></body>
)

0 : 0  NB. Test
process_pstop_ TEST4
process_pstop_ TEST4a
process_pstop_ TEST4b
)

   process_pstop_ TEST4
[body]
  [p n=a v=11]
  [/p]
  [p n=b v=22]
  [/p]
Value of n=b is 22
   
   process_pstop_ TEST4a
[body]
  [p n=a v=11]
  [/p]
  [p n=c v=33]
  [/p]
[/body]
Value of n=b is not found
   
   process_pstop_ TEST4b
[body]
  [p n=a v=11]
  [/p]
  [p v=22]
  [/p]
|xml error 1001 at (1 23): Attribute "n" missing: assert
|       (assert~error)0


Valid HTML 4.0!