54 lines
2.4 KiB
Markdown
54 lines
2.4 KiB
Markdown
# Legal Corpus Roadmap
|
|
|
|
This roadmap tracks coverage of the inventory listed in `LEGAL_CORPUS_IMPORT_LIST.md`.
|
|
|
|
## Summary Statistics (as of 2025-11-28)
|
|
|
|
| Metric | Value |
|
|
| --- | --- |
|
|
| **Total inventory items** | 153 |
|
|
| **Status: success** | 64 |
|
|
| **Status: error** | 26 |
|
|
| **Status: no_direct_link** | 63 |
|
|
| **Chroma vector count** | 5,290 vectors |
|
|
| **Raw corpus size** | ~16 MB |
|
|
|
|
## UK P0 Document Completion (COMPLETE)
|
|
|
|
All 5 critical UK P0 documents have been integrated into the corpus as of 2025-11-28:
|
|
|
|
| Document | Size | SHA-256 | Status | Chunks |
|
|
| --- | --- | --- | --- | --- |
|
|
| Employment Rights Act 1996 | 1,031 KB | 3fc1af7f2d48cb73ac065b39b75fa0cd... | success | 6 |
|
|
| Patents Act 1977 | 455 KB | cf62370ebed67cc448aec06955d1f33c... | success | 3 |
|
|
| Trade Secrets Enforcement Regulations 2018 | 18 KB | bfd00428c7b9c723ca50aafba8e0a9b2... | success | 1 |
|
|
| Social Security (Intermediaries) Regulations 2000 | 19 KB | dd9655af3e235f04c8cb06ec1e6a406f... | success | 1 |
|
|
| Copyright Rights & Databases Regulations 1997 | 18 KB | 5c5fee5d641e4999fc2846ff3837758c... | success | 1 |
|
|
|
|
**Total UK documents ingested:** 12 (7 pre-existing + 5 new P0)
|
|
**Total UK vectors added:** 28 new vectors
|
|
**UK collection status:** Complete for freelancer contract analysis
|
|
|
|
## Download Status by Category
|
|
|
|
| Category | Downloaded | Errors | No Link | Total |
|
|
| --- | --- | --- | --- | --- |
|
|
| US Federal | 8 | 7 | 9 | 24 |
|
|
| US State | 8 | 2 | 8 | 18 |
|
|
| EU | 0 | 8 | 2 | 10 |
|
|
| Germany | 6 | 0 | 1 | 7 |
|
|
| France | 0 | 4 | 1 | 5 |
|
|
| Canada | 8 | 0 | 1 | 9 |
|
|
| Australia | 6 | 0 | 0 | 6 |
|
|
| UK | 12 | 0 | 0 | 12 |
|
|
| Datasets | 1 | 0 | 2 | 3 |
|
|
| Case Law | 0 | 0 | 25 | 25 |
|
|
| Industry Standards | 10 | 13 | 5 | 28 |
|
|
| Scripts | 0 | 0 | 6 | 6 |
|
|
| Estimated Totals | 0 | 0 | 8 | 8 |
|
|
| **TOTAL** | **91** | **45** | **17** | **153** |
|
|
|
|
## Unable to download — reasons and workarounds
|
|
- Items without direct URLs (for example, some case law rows) will be marked `no_direct_link` in the manifest. Extend the downloader to use CourtListener or other APIs by citation to automate these where possible.
|
|
- HTTP 403 Forbidden errors on several EU & French domains (legifrance.gouv.fr, sagaftra.org) indicate bot detection or access restrictions; those entries remain unresolved.
|
|
- Connection timeouts on house.gov and fairwork.gov.au have been resolved with alternate endpoints (`laws.justice.gc.ca`, `legislation.gov.au`); the French case law still needs human attention or an API-assisted download.
|