Good vs. Evil

  • The FTB Forum is now read-only, and is here as an archive. To participate in our community discussions, please join our Discord! https://ftb.team/discord

lenscas

Over-Achiever
Jul 31, 2013
2,015
1,801
248
97 remember that I need to make a webshop for school? And that we ended up going to sell magic cards instead of fruit?

Well.... I may be in trouble. Let me explain:
Every card has a cost, consisting of symbols. So, I decided to make a table called Symbols containing the name, the picture and the text representation. So far, so good.
After that, I decided to create a table called Costs. This was so cards could reference this table to say what their cost is. That way its easy to get cards that have a certain cost.
However, as their is no limit of symbols inside a cost, I also created a SymbolsInCosts table. This contains the values CostId and SymbolId and is used to determain which symbols are in a cost.

So, lets say every card that has a manacost of{3}{w}{u}.
The first step is easy. Just look at each symbol inside the symbols table and get their id. Lets say the result is {3} = 1, {w} = 2 and {u} = 3.
Then, I need to look into SymbolsInCosts and get every row where SymbolId is either 1,2 or 3.
Then, I need to look at the costIds returned and filter every row out where not ALL 3 symbols are referenced.
Then, I need to filter every costId out which also has MORE than those 3 symbols.

This.... is a pain to do in SQL, which is kind of what I want to use....
 

lenscas

Over-Achiever
Jul 31, 2013
2,015
1,801
248
46 so... slowly but surely every field of every table gets filled in.
I still miss
  • cardFaces
    toughness
  • cardsInSet
    cardId
    setId
  • Color
    name
    landType
  • CostSymbols
    picturePath
  • imageURL
    small
    normal
    large
    png
    art_crop
    border_crop
    printId
  • Languages
    name
  • Legalities
    backgroundColor
  • Parts
    partOneId
    PartTwoId
  • Prints
    price
  • PrintFace
    PrintId
    FlavorText
  • Types
    name
  • TypesInCard
    cardid
    typeId
(everything starting with a dot is a table, with the names underneath it the columns in said table.)

Note that some things need to be filled in by hand, luckily on those cases there aren't many rows in the table. (Like color and legalities which only get 5)
 
Last edited:

lenscas

Over-Achiever
Jul 31, 2013
2,015
1,801
248
47 Manual data entry is a always a bit of a pain.
45 a bigger pain is that the data is supplied as a massive JSON file. And with MASSIVE I mean that is takes up 559.0 MB (or more precisely 586150517 bytes). Also, this is in minified form. So, no new lines to make it pretty and readable.

Its actually so big, that I had to switch language for the import script, as the language that I was going to use just crashed while parsing the file. Sure, I could've also used another library but trying another language was easier than hunting down another library.

It also does not contain everything I need. The price, for example isn't listed and there is other data that I need that isn't listed because that is another dataset (like every set, the languages, etc)
 

lenscas

Over-Achiever
Jul 31, 2013
2,015
1,801
248
44 I just wish I didn't have to load the whole thing into memory at once. I get why they didn't go with a csv file as the data is just a bit too complex for that but not having to load the whole thing into memory before it could be used would be nice.

Especially as my laptop (which is where I need the data on) only has 8 GB of RAM. Which means I can't really use it for anything else while the import is running. I already tried to unset everything that I didn't need anymore but....either my code was wrong (which I doubt) or it just doesn't help enough.

I guess I could make a script that splits the file but, assuming I do the easiest thing and go with 1 file= 1 card. I end up with about 199394 separate json files in 1 folder, which doesn't sound like a great idea. And making the folder structure more complex also doesn't sound like its worth as ideally I only have to run it once when its done.
 

duckfan77

Popular Member
Mar 18, 2013
80
683
118
45 Are there no other logical splits to be able to make? Even cutting it into thirds or fourths sounds like it might be sufficient to make it significantly easier to handle.
 

lenscas

Over-Achiever
Jul 31, 2013
2,015
1,801
248
45 Are there no other logical splits to be able to make? Even cutting it into thirds or fourths sounds like it might be sufficient to make it significantly easier to handle.
46 there are plenty, but all make the code more annoying. Either by making the finding process of the file bigger or having to add code to automatically load in a new file.
There is one thing I can do though is to make a new file where each entry is on its own line and isn't part of the same json anymore. That way something like :
Code:
[{"somekey":"someData","somethingComplex":{"an array";[1,2,3]}},{"somekey2":"someData2","somethingComplex2":{"an array";[1,2,3]}}]
simply becomes
Code:
{"somekey":"someData","somethingComplex":{"an array";[1,2,3]}}
{"somekey2":"someData2","somethingComplex2":{"an array";[1,2,3]}}
And now I can simply read and parse every line separately, though that breaks if there are new lines in the data but I don't think that that is the case and even if that is the case I can replace them with something sensible while transforming it.

Though, then again I'm almost done with this import script so.... why bother at this point?

A way to store data as text that can be read per line and can be used to store more complex data than csv would be nice
 

duckfan77

Popular Member
Mar 18, 2013
80
683
118
47 CSV can store some fairly complex data, and hierarchical data would be difficult to represent on one line, but it's probably doable, just more expensive on disk.