Massive congrats on getting letters to emerge! The fact that a generalist model trained on fragments could translate to PHerc. 1667 without scroll-specfic labeling is a huge validaton of transfer learning in this domain. I experimented with similar cross-domain models on satellite imagery a few years back, and the hardest part was always avoiding overfitting to the noise patterns of one sensor. The 2.4 µm resolution jump probably gave the model enough fidelity to distinguish actual ink from background texture, and that's gonna be critical for scalling across more scrolls.
Incredible breakthru on the generalist model front. Training a detector across multiple fragments (500P2, 343P, 9B) and having it work on a completeyl new scroll without per-scroll tuning is where this needed to go. I ran into a similiar problem last year where scroll-specific models kept overfitting to texture artifacts instead of generalizing to ink, so seeing that transfer learning actually hold up here is wild.
For the high-res volume 4 (2.4 µm pixel size), can we access the data via ash2txt, or is the Kaggle competition dataset all that is currently available?
That’s fantastic news. Congratulations!
Massive congrats on getting letters to emerge! The fact that a generalist model trained on fragments could translate to PHerc. 1667 without scroll-specfic labeling is a huge validaton of transfer learning in this domain. I experimented with similar cross-domain models on satellite imagery a few years back, and the hardest part was always avoiding overfitting to the noise patterns of one sensor. The 2.4 µm resolution jump probably gave the model enough fidelity to distinguish actual ink from background texture, and that's gonna be critical for scalling across more scrolls.
Incredible breakthru on the generalist model front. Training a detector across multiple fragments (500P2, 343P, 9B) and having it work on a completeyl new scroll without per-scroll tuning is where this needed to go. I ran into a similiar problem last year where scroll-specific models kept overfitting to texture artifacts instead of generalizing to ink, so seeing that transfer learning actually hold up here is wild.
That's great news!
For the high-res volume 4 (2.4 µm pixel size), can we access the data via ash2txt, or is the Kaggle competition dataset all that is currently available?
So exciting! Congratulations!