I figured how to do exactly what I was looking to do in terms of citing sources with .md files stored in spaces buckets (https://www.digitalocean.com/community/questions/linking-to-source-documents), it’s actually quite easy if you just request the sources/context from the API. Now I have a new challenge. I would like to move to using a .csv file because I want to add columns of metadata next to the content so that the metadata and content is all unified in a single knowledge base.
The problem is that I was relying on the filename names, bucket names and directories to construct the original source. With a single large .csv file every source ends up as /bucketname/foldername/some_big.csv
Maybe the only solution is to create an individual .csv for each and every .md file. That is an OK solution but I was just curious if that would be the best practice or if there is a better way and if there is a way to accomplish it with a single large .csv
This may be useful for other projects as well since datasets are often provided as .parquet files which are easy to dump into a large .csv file.
(it would be great if knowledge bases had support for .parquet files directly)
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.
Hi there,
Really great to see you pushing the GenAI platform this way, you’re getting into the kind of real-world use cases that can really help shape future improvements!
From what I understand, with a single large
.csv
, it’’s tricky to track individual sources properly since everything points back to the same file. Splitting into multiple.csv
files (one per logical source) might be the more reliable approach right now, similar to how multiple.md
files work. But I’m not 100% sure if that’s the only way, it might be worth checking directly with DigitalOcean Support.Also, full support for
.parquet
files or more flexible metadata would definitely be a great improvement.I’d really encourage you to send this feedback to DigitalOcean Support, you’re raising exactly the kinds of points that could help improve the product and documentation over time.
- Bobby