Skip to content

Query tables from plain text files #8

@leftmove

Description

@leftmove

The most recent addition to wallstreetlocal was the ability to query XML files along with HTML files. The only format remaining to code in now, is plain text (TXT).

The SEC's XML and HTML stocks were barely structured enough to be queried accurately, but TXT provides an even harder challenge. The problem is the inconsistency. While tables in TXT can be read fairly easily by human eyes, they are too disimilar to query effectively.

Here are some minified examples.

<TABLE>            <C>                                              <C>
                                            FO     RM 13F IFORMATIONTABLE
                                            VALUE  SHARES/ SH/ PUT/ INVSTMT  OTHER  VOT    ING AUTRITY
NAME OF ISSUER     TITLE OF CLASS CUSIP     (X1000)PRN AMT PRN CALL DSCRETN  MANAGERSOLE   SHARED NONE
------------------------------------------- ------------------ ---- -------  -----------------------------
AFLAC INC          COMMON STOCK   001055102       3      71SH       DEFINED       71      0      0
AGL RESOURCES INC  COMMON STOCK   001204106     123    3025SH       DEFINED     3025      0      0
ABBOTT LABS COM    COMMON STOCK   002824100    1606   30519SH       DEFINED    27798   2721      0
ABERCROMBIE & FITCHCOMMON STOCK   002896207       0       2SH       DEFINED        2      0      0
AIR PRODUCTS & CHEMCOMMON STOCK   009158106   16728  175017SH       DEFINED   140030   2282  32705
AIRGAS INC         COMMON STOCK   009363102       4      52SH       DEFINED       52      0      0
</TABLE>
<TABLE>                   <C>   <C>        <C>       <C>                <C>   <C>   <C>
                                             VALUE                       INV.  OTH   vtng
      NAME OF ISSUER      CLASS    CUSIP    (x$1000)       SHARES        disc  MGRS  AUTH

Albertson College of Idaho Large Growth
ADC TELECOMMUNICATIO      COMM  000886101         $18         337.00     Sole  N/A   Sole
AFLAC INC                 COMM  001055102         $14         298.00     Sole  N/A   Sole
AES CORP                  COMM  00130H105         $18         232.00     Sole  N/A   Sole
AXA FINL INC              COMM  002451102         $18         504.00     Sole  N/A   Sole
ABBOTT LABS               COMM  002824100         $61       1,724.00     Sole  N/A   Sole
ABERCROMBIE & FITCH       COMM  002896207          $2         114.00     Sole  N/A   Sole
</TABLE>
<TABLE>
                                                             VALUE    SHARES/ SH/ PUT/ INVSTMT            -----VOTING AUTHORITY-----
  NAME OF ISSUER                 -TITLE OF CLASS- --CUSIP-- (X$1000)  PRN AMT PRN CALL DSCRETN -MANAGERS-     SOLE   SHARED     NONE
                                 <C>                                              <C>
D DAIMLERCHRYSLER AG             ORD              D1668R123        5      112 SH       DEFINED 05              112        0        0
D DAIMLERCHRYSLER AG             ORD              D1668R123       31      748 SH       DEFINED 05              748        0        0
D DAIMLERCHRYSLER AG             ORD              D1668R123        5      130 SH       DEFINED 06              130        0        0
D DAIMLERCHRYSLER AG             ORD              D1668R123      246     5894 SH       DEFINED 14             3089        0     2805
D DAIMLERCHRYSLER AG             ORD              D1668R123      118     2832 SH       DEFINED 14             2104      604      124
D DAIMLERCHRYSLER AG             ORD              D1668R123        8      200 SH       DEFINED 29              200        0        0
D DAIMLERCHRYSLER AG             ORD              D1668R123       63     1510 SH       DEFINED 41                0        0     1510
</TABLE>

The column sizes, names, and overall formatting of each table changes too often for any meanginful code to be written. Without writing a gargantuan amount of code, or using AI (which is expensive), there doesn't seem to be much way to query stocks like this.

There should be a better, more effective method to taking the TXT tables, and creating usable, structured data.

Metadata

Metadata

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions