Announcement

Collapse
No announcement yet.

Music Chart Database

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Music Chart Database

    So I’ve been having this idea for a few years about putting together all available chart information into one big database. Obviously this is not a new idea. I'm sure anyone who is interested in music charts has had this idea and has had his own understanding on how to implement this idea and in fact probably has implemented it in some ways, making his private collection of music charts, keeping it in order one way or another (file system, Excel spreadsheets, actual database etc.). I'm sure some people on this forum might have rather full-fledged private solutions to manage this for themselves. But obviously I wanted to make something of my own which would solve many different problems for me like keeping all the data together, browsing and managing it through some nice interface, viewing chart runs or artist chart histories, being able to do break downs by years or decades or running any other custom analytics etc. When all these ideas are thrown in, the actual solution might get rather complex with many details to think of and architectural decisions to make etc. Not impossible with my expertise of many years of software development, but still requiring time and effort. So I was always going to do this, and over the time have been slowly designing various bits of it, but kept delaying the actual implementation until better times. Since last year I started to think that the time is up – can’t delay it anymore, as there is a high risk that some of the chart data may be permanently removed from the Internet. Sure, nothing can be truly removed from the Internet, but some of it can become very hard to find or inaccessible, so if I want to back it up and keep it safe, I need to do it now.

    So I really feel that now is the time for me to put this effort, and in fact last year I already started some actual design work, but there are always things that slow me down, and the initial motivational push may not be enough to do something for myself only. So I always knew that even though I’m doing this for me in the first place, I would probably share the end result with my friends on this forum who have the same interest in music charts as I do, and at the end of last year I did want to kickstart some initial rough version to get some feedback, but then I got caught up in various other stuff and the effort slowly died out. So it seems to me that to get this going I need a) to constantly stay motivated b) probably to try to get some help.

    Before I go any further, I first need to mention some important stuff.

    This project is obviously a private effort, and it is actually impossible to make the end result available to general public, e.g. host the public website or distribute the database file or provide any kind of end-point API to the data etc. It is impossible due to the legal and moral reasons, as some of the chart data is rightfully owned by other companies to which I only wish the best and want them to operate by whatever business model they see more fitting in order to be able to continue doing what they do even in the challenging times, with the least thing I want to do is to interfere with their operations and move people away from their own products. However, the fear that some of the data that was previously available through the owning parties may be removed and the historical knowledge that comes with it become lost forces me to take action and preserve as much data as possible. So it would be private project, but as mentioned before, I do want to share this archiving process and its end result with my friends, the people on this forum who have the same interests as I do, so I would gladly accept the interest in it, the feedback, good and bad, and the potential help anyone interested can give.

    If no one here is interested, I will probably go very slow, but I will still do it. But I really want to get to the end result faster, so by sharing this with you, I actually want to speed up this process. Having other people interested in this project would give me crucial motivation to go on with the implementation, plus maybe someone can actually give a helping hand either technically (most likely there are other people on this forum who can code), or helping out preparing raw data to bulk-upload into database, or helping with cross-checking the database data or something else. In exchange, I can promise that the end result and in fact any intermediate result of this work would always be available for all involved people at any time, so that if you join this affair you will always own the resulting data – with the only condition that this is of course to be treated as private data not to be shared with outside world in any way (because we have a friend club here, and not some evil conspiracy).

    And probably another important thing before I actually get to the point. As I said before, I believe some of you might have similar private solutions for themselves, in which case I want to assure that I am not doing this to challenge or undermine your many years of private efforts collecting and keeping the chart data, and I respect everyone who has done this and do not expect you to share all your secrets unless you really want to join and give some help, in which case any quality data that would help kickstart the project at the beginning or enrich it at any further stage would be truly divine. And also, if anyone of you is aware of already existing private club or private project which was established to solve these very problems – please let me join it so instead of wasting my time on something new I could help with the existing work.

    Will continue my text in the next post.

  • #2

    So the ideal end result that I currently envision will be a combination of 2 main parts.

    1) Database. Obviously the main driver of this project. It would contain music chart data (weekly charts, year-end, decade-end etc.), basically anything you could think of – US (BB, Cashbox, Record World), UK (Official Charts, NME etc.), other countries like Germany or Sweden, with all tracks linked to their artists, with all essential data like featured artists, producers (when it is necessary), album variants, connections between songs properly maintained. When some weekly chart data is missing we could maintain Guiness-style listings (e.g. 1980s UK Indie Charts). So it could potentially contain every possible it of data in one place, with the only one strong exception I can think of – it should not include UkChartsPlus data in respect to the people on this forum who are involved in producing it (I am not in any way affiliated with it, but it is important to keep it alive and profitable). Also I would probably suggest not to include the last 2 or 3 years of chart data to once again avoid any possibility of interference with commercial efforts of owners of the data (sure, I would privately back up new chart data in the raw form as to make it possible to include in database later on). In fact, besides the weekly charts, the database could include additional, and in my opinion somewhat related data, like awards data (e.g. Grammys), certifications numbers (RIAA), all-time/year-end critic lists – I think all these would fit in nicely.

    Database itself could be split in 2 parts – music data (artists, albums, songs, i.e. our own version of musicbrainz) which can potentially be openly shared and used in other projects as it would not contain any sensitive data, and chart data.

    Some inherent technical details – database would not reside on some server nor in the cloud, it would be in a single-file binary format (sqlite) as to be easily shared with involved people so that every member of the private club could own a copy.

    2) Software. I’m thinking of web interface with the possibility to browse data by years/weeks, artists, countries, individual charts etc. It could be configured to be non-editable(browse-only) and also have necessary CRUD functionality for data admins (note: actually most of the raw data would be inserted into database using separate automated tools, but it should be possible to insert charts manually and make amends/fixes to existing data; also most of the automatically inserted data would have to be validated manually anyway to be potentially fixed/adjusted). Any individual chart or chart run or artist data could be exported in Excel file, and more complex analytics tools could be provided (e.g. create a combined chart for any date range using some point system). Obviously the User Interface, unlike chart data, is not at all sensitive, so its code could potentially be open-sourced and reused by any interested party. I’m thinking of cloud-hosting one main instance of this web interface for me and those people who would wish to join the private club to help collect and maintain data, with private logins and for limited number of users, but any one of you could potentially host and maintain their own private instance in their own cloud for their own team or group of people to keep their own music/chart data for their own pleasure. Also, as modern technology allows, the same code could be also built as a desktop software program for individual private use at home if no web access is needed on the go.

    Continued in the next comment..

    Comment


    • #3
      OK, now if at this point you think it all sounds great but too good to be true, you would be right to assume that this won't be built in a day and even in a month, and it would have to be done in stages. Here's what I think the stages and building blocks for this project could be.

      First of all, if we want to glue together all the chart and music data to make it browseable and have all the fancy analytics on top etc., we obviously need to design the data model (think, database schema). This is actually almost done. I was designing it in my head for a long time, trying to incorporate all the interesting cases of songs/albums relations, examples of weird chart relations etc. and even implemented parts of it some time ago.

      Then we actually need to agree on exactly what charts we need to have first. We need to maintain the list of charts we want to have and info on every chart – metadata like start date of the chart or number of positions in the chart etc., status information on whether we can easily get this data or if it’s not available or partly unavailable for us to get etc.

      For each chart then, we would have a basic “building brick” process to import it into database. I.e. to import a particular chart run (for example, we want to import all weekly charts of UK Singles Chart from 1952 to 2019) we would need to follow roughly these steps:
      - Find the source of data for the chart run
      - Grab it (i.e. download, save as text files, crop image from magazine scan etc)
      - Convert it to the intermediate representation, either parse it using some code, or convert it manually. What is this intermediate representation, you ask? Excel file! This is actually the most important stage – any chart data or chart run that can be put into Excel file can then very easily be imported into the database by simple scripts. It might vary in detalization quality after import (based on how detailed was our Excel file), but the job will be done. SO next step is just this:
      - Import Excel file into the database. After this step the data is no longer raw – it lives in our database world now – but it is still not finished as a) it is not properly linked to actual artists or other entities that already exist in database, b) it was not verified by human eyes. So immediately after import dat ais not longer raw but it is still “semi-raw”. Actually, in some cases, some of the links can be established during import so that immediately after the import you can open the chart in web interface in the browser and be able right away to click through each artist/title of the song/album, but sometimes this can be a) incorrect b) undetermined so that the chart position in UI is plain text without links to other entities. So we have next step:
      - Human verify each chart through software web interface. Open chart and see if data was imported correctly. Mostly you would need to check new entries and re-entries because everything else would be determined automatically.

      Now as for the software, it would also be developed in stages. Some time ago I already made a first attempt to build such CRUD-type software to be able to read/update our data model, but faced some issues, so I need to revise the approach and change some parts of technological stack, so would almost have to start from scratch here. I do think that the very first version I could build could be centered around just the calendar and charts, i.e. showing the years/weeks and charts for each week, with each chart at this point only available in “semi-raw” stage, i.e. each position as a plain text (and we could also show “raw” source like magazine scan pages where applicable), not yet linked to artists. But this first version would be super-important as it would enable us to already start working on data, which is the most crucial part, even if it would not be yet presented as fancy as we target it and would only at first allow for limited human verification (i.e. only what concerns of the chart positions, with no possibility at this early stage to enter artist/album/song links and relations).

      Second stage could be to properly implement the rest of the data model and the ability to human verify and make amends not only to chart positions, but also to chart entities with links to artists and other relations.

      When that is done, third stage could be adding all the fancy analytics.

      Continued in the next comment..

      Comment


      • #4
        So if you still think that sounds good and you want to join my private party to give some help, how can you potentially help?

        First of all, maybe someone here can code in modern Javascript, NodeJs or Java/Kotlin, then maybe you can help with some of coding, or at least just let me talk to you and flood you with technical details (which sometimes helps sort things out in ones head)

        But truly the best help at this point would not be in the coding realm (I can manage that), but in the data realm. Step #1 is to create and prioritize the list of charts to save. Step #2 is to get raw data, back it up, and put it into Excel so that automated script can later import it into the database. Step #3, after Excel data is entered as "semi-raw" data is to verify it and make fixes/amends when needed.

        By the way, in the data realm we are going to have a big kickstart, as from the very start our database can contain all the US and UK charts currently available at the respected sites. And even though in case of US there would be the same deficiencies as can be found at the respected site, i.e. incomplete charts with missing positions, it would still save us huge effort on something that is probably of the biggest interest to the most of us.

        So if all this STILL does sound good to you, and you do want to participate in this party, please leave your feedback here and also send me a private message. I have no idea how big an interest for this can be, but if there is an actual interest, then we can organize our shared work, which we could then move externally, as probably we would need to setup some chatroom and collaborative tools.

        One slightly disappointing thing, however, could be that we will have to start very slow. Main reason is that in general, even though I am very motivated to do this project, at this very moment I do not have a lot of truly free time due to high workload and some family projects that have priority right now. It's not ideal, but we need to be realistic with the timeline. My January and February will be very busy (in fact right now I am at the busiest, even though I took some time to write this post), then it will get somewhat lighter for me for some time but mid-year will again be very busy time for me, with hopefully a lot of time at the last third of the year. What it means in practice is that a) even if there is immediate interest from you, be aware that due to my current workload the initial project bootstrap will be slow b) in terms of actual coding work I will realistically be able to put just about 8 hours a week on this (and probably not right now but only starting from March).

        But if you are still really interested in this, I can suggest the following timeline. Next week I will gather feedback and will understand how many people are interested to join the party. Then in early February we will start organizing our work – probably setup a chatroom and some collab tools, and priority #1 will be to create and prioritize a list of charts we want to enter. Then in February we can starting working on getting raw data / Excel spreadsheets – they would be shared among all participants, so it won’t be just me collecting all these data to run away. In March I can actually start a) putting initial Excel data + kickstart with UK&US charts that I already have into sqlite database file which would also be shared with all participants, b) start coding first stage of user interface. End of March / early April I could deliver first stage of software, i.e. browsing by years/weeks and viewing “semi-raw” chart data. Second stage of software – too early to say but I would only commit to deliver no early than September.

        So tell me what you think about all this. Any feedback is appreciated.

        Thank you.

        Comment


        • #5
          Hi all!

          So far I got just a few private messages on this topic, and from the first feedback I understand that I wrote too much text at once, and even though I was intentionally very detailed, some things were not very clear.

          1) It was me who wrote the chart conversion tool that is used on this forum for the weekly US topics, but it was done in just a few hours and does not represent how the database will look like - it will be much more pretty, fancy and usable.

          2) As mentioned, database from the starts will include the majority of official US and UK charts, which means more effort should be spent on getting other chart data, like "non-official" US/UK, other countries etc.

          Hope to get more thoughts about this.

          Thank you.

          Comment


          • #6
            Hello all
            It is a pity that there is so little resonance.

            Yes, I have sent Xsergan a private email and I would like to support him. In the many years I have been creating charts as a hobby, there have been many attempts in various forums or chats ... to create something like this.
            I can only support it and maybe it will be a database with a lot of information about music and a secure archive.
            Please just write a short comment here, it is simply motivation for Xsergan and other chart freaks to participate here.
            I was not sponsored

            Comment


            • #7
              Well as everyone here knows I have compiled multiple spreadsheets that contain all the Billboard charts SS. These include pre-1958 pop, C&W, and R&B singles charts (when they had 3 different charts) plus the current versions of these charts 1958 to present. I also have SS for the bubbling under, adult contemporary, and rock charts since their inception to now.

              I recently posted the the Cash Box C&W and R&B complete plus I did the Record World pop charts 1954-1982 almost complete. If you did not grab them when posted I’ll be more than happy to send them to you.

              I’m sure Lonnie will check in on this as he probably has much of the information you seek in his database. He assisted me greatly in the past. His US charts are pretty good but have errors. My charts will be in better shape but will likely contain errors - we all try for perfection and whenever I think I achieved it, someone will find an error.

              Comment


              • #8
                Some of us have done parts of the searching for nonofficial multiple UK charts over the years, but there’s always been too many weeks missing and too few of those who could support the work have taken part. DrTravel has done a great job regarding the three main US charts and Lonnie is the foremost UK chartist. There are still missing charts out there and this effort may be the one that has enough momentum to get hidden charts out in the open.

                Comment


                • #9
                  Very interested to see how this develops and willing to help in any way I can.

                  Comment


                  • #10
                    This sounds like a great idea. Unfortunately I have no real programming expertise, but I am always willing to fact check or can possibly create spreadsheets.

                    Comment


                    • #11
                      Will it really be necessary to put the data into a spreadsheet and then import it into the database. Seems like an unnecessary step. Don’t most people just capture the data and import it directly through coding?

                      Comment


                      • #12
                        Thank you all for your interest! Looks like we are going to have a quorum after all. I will keep you posted on the progress and next steps.

                        DrTravel spreadsheets is some really great stuff, I tried to follow the respectful thread - I'm not sure I was fast enough to save them all. I will double check what I have a bit later (when I get to drafting the list of charts and data availability statuses), and then work on filling the gaps. Thank you DrTravel for doing this and sharing.

                        Originally posted by Chartaholic View Post
                        Will it really be necessary to put the data into a spreadsheet and then import it into the database. Seems like an unnecessary step. Don’t most people just capture the data and import it directly through coding?
                        As for the spreadsheet step - sure, in some cases it is possible to directly adapt raw data sources into the database, but in other cases it might not be as trivial - imagine data available only in low quality magazine scans without practical possibility of OCR-ing it. Therefore in some cases spreadsheets of chart runs may have to be painstakingly created by hand, while in many other cases such spreadsheets are already available, so streamlining spreadsheet preparation to a separate step feels natural to me. In this case we may have various data adapters / manual processes designed to target spreadsheet as a unified and human manageable format. Then importing spreadsheet to database is a simple and straightforward procedure (think of excel->database conversion as a dedicated microservice).

                        Comment


                        • #13
                          Here is a partial list of potential concerns. These range from data entry to accuracy concerns.

                          Songs
                          - Linking remixes to a primary title. Example Perfect with and without Beyoncé.
                          - song titles different in different countries. Examples from a u in Honor/Honour to a completely different worded title. Really another linking issue,
                          - features and artist roles (I have already PMed about this).
                          - artist reusing the same title for a different song,
                          - will a standard title be used, or will it be the title on the chart. Example some charts like to mess around with the song title spellings. Even inconsistently. ARSA is pretty strict on using a default title, as an example of how one site handles this. Or should it match the actual charts title.

                          Artist
                          - using a different name in a different country example spinners/Detroit spinners
                          - change in name
                          - artists with the same name.
                          - see note on Songs about using name on chart or a default nsme

                          Albums
                          - re-releases with additional tracks. With title change
                          - different name/spellings in different places
                          - type of album example live, compilation, cast etc
                          - see note on songs about using name on chart or a default name,

                          Charts
                          - name change example modern rock/alternative

                          Most of this list is really almost all the same overall issue of linking and uniqueness.

                          perhaps it isn’t stuff to even be concerned about, but if it is, it should be implemented at the start, even if not used initially, rather than trying to add things in later which can cause all kinds of headaches.

                          some of the stuff like same artist names is just going to need to be watched for manually.
                          Last edited by Chartaholic; Thu January 23rd, 2020, 14:51.

                          Comment


                          • #14
                            The way I have handled multiple artists on a single track in my databases is using an additional "CanonicalArtist" field like below:
                            Artist CanonicalArtist Title SongID Country Peak etc...
                            Adele Adele ft. Lil Wayne Merry F**king Christmas 102296969 USA 2
                            Lil Wayne Adele ft. Lil Wayne Merry F**king Christmas 102296969 USA 2
                            Adele Adele ft. Lil Wayne Merry Christmas 102296969 UK 1
                            Lil Wayne Adele ft. Lil Wayne Merry Christmas 102296969 UK 1
                            There may well be better ways, but this way a simple select by artist gets everything by an artist, featured or primary, with the correct credit listing. If you wanted to separate the two the way some of the Record Research books do, you could just add another column to flag featured roles.

                            Comment


                            • #15
                              Other potential issues (all real incidents):
                              - The A/B sides are flipped and the song's name changes during its chart run. Also, the B-Side gets an A-Side catalogue number and official release; and two singles are merged.
                              - Writing credits are amended during a chart run
                              - Song which was charting in one version (possibly two versions merged) will be split in two charting songs (e.g. two different languages) and having the same LW and Weeks On when listed separately
                              - The very same song on two different labels will be merged to one charting position
                              - Song charting as an import and retaining its chart course when officially released domestically. As opposed to the import version falling behind and the new domestic release being a new entry
                              - Artist's name officially changing during a chart run
                              - Singles that have two different acts and songs on A/B and it's promoted as a double sided issue
                              - Singles that will have the B-Side featured alongside the initial hit once the other song is promoted (All during one chart run)
                              - Inconsistent categorization: It's a single in country A but a budget album in country B. It's a single (12" single or EP) in one chart and considered an album in another.
                              - How to handle entries that will be officially removed from a chart, e.g. after a lawsuit, chart manipulation, etc. A citation, footnote or explanation will be needed
                              - Obvious mistakes: The same song and artist is charting twice! Until further proof there were no two different releases (and the catalogue number is the same)
                              - etc., etc.
                              Certain that there are more hard nuts to crack
                              trebor's - 2016 in Country Music
                              trebor's - 2015 in Country Music

                              Comment


                              • #16
                                Here are all the Billboard R&B 1942-2020 Charts:

                                https://ufile.io/czb89zds

                                Comment


                                • #17
                                  GaryG it really depends on what you hope to accomplish with the data. If you want to do all time rankings and give less points, like Billboard, for features, you need to separate them out more and be able to track how many artists on the song,
                                  Last edited by Chartaholic; Fri January 24th, 2020, 03:56.

                                  Comment


                                  • #18
                                    Thanks Trevor, I was hoping others would follow suit with issues they have as well encountered.

                                    Comment


                                    • #19
                                      ^
                                      My stance is that a chart generated from a data base should look exactly the same as it was originally released on one specific week.
                                      Not so much an issue for the USA but countries like Germany, Italy, Spain, etc. will need a heck of footnotes and explanations added.

                                      //
                                      Also Soundtracks are tricky as they should retain their country specific name (as on the charts originally published) and the term "Soundtrack" varies from country to country.
                                      This means that a lot of manual work, general knowledge, country specific expertise is involved and simple csv data uploads may potentially not be possible.
                                      trebor's - 2016 in Country Music
                                      trebor's - 2015 in Country Music

                                      Comment


                                      • #20
                                        Originally posted by DrTravel View Post
                                        Here are all the Billboard R&B 1942-2020 Charts:

                                        https://ufile.io/czb89zds
                                        Thank you DrTravel !

                                        Comment


                                        • #21
                                          Originally posted by trebor View Post
                                          This means that a lot of manual work, general knowledge, country specific expertise is involved and simple csv data uploads may potentially not be possible.
                                          You are absolutely right, every data that is bulk inserted should be verified by a human expert. Still, bulk csv upload is a very reasonable time-saving step. We would need to mark new charts inserted this way as "not verified" or "under review", and hopefully get someone with expertise to check and put necessary amends or comments and register his approval.

                                          Comment


                                          • #22
                                            Oh, I understand and agree completely. Huge task ahead then.
                                            ( For some odd reason I was assuming the uploads would go directly live How silly of me.)
                                            trebor's - 2016 in Country Music
                                            trebor's - 2015 in Country Music

                                            Comment


                                            • #23
                                              And thank you all for the examples of these tricky cases. I also do have more examples like this.

                                              It is true that the data model must not be naive, and should be designed with all the tricky cases in mind from the start. A lot of things should be generalized rather than chosen based on some assumption of being inherent to the domain in question. In fact, tricky cases are often just the results of wrong assumptions. There is a classic example of how you should think just one bit more before making assumptions about seemingly obvious things - https://www.kalzumeus.com/2010/06/17...e-about-names/ - it is a very short but fun read.

                                              For example, an obvious thing would be to have a table for songs and a table for albums in our database, right? That would be most essential and very powerful, as we would be able to do all we want with these entities - link albums to songs, link all of them to artists, and of course link album charts to albums and song charts to songs. And for the most charts it would work, but then we would have very annoying tricky cases. E.g. if we separate "song charts" and "album charts" due to technical constraints of having to link to the proper table (either song table or album table), then how would we handle EPs, which should obviously be modeled as albums, but occasionally (very often) show up in the song charts? And there have even been examples of "mixed" charts by definition - e.g. Swedish charts from 1960s as far as I know included both singles and LPs. But this tricky case would not exist if we just put songs and albums (and EPs, and stuff like double A-sides) into one unified table (e.g. called "entities"). Sure, we would need to maintain an entity type column (wherever its applicable) and probably have to deal with different kinds of relations based on entities of which type are linked, but such generalization would immediately eliminate a lot of "tricky cases".

                                              This is just an example, and I will take time next week to document the data model I have in mind and how it would handle each and every "tricky case" I can think of, including those suggested by you. We will be able to answer every one of them.

                                              Sure, data model design per se is not enough - each case has to be properly entered into database. Like I originally said - after import from Excel data would still be "semi-raw" i.e. only the minimal and most obvious entity links would be possible to establishe automatically, so like trebor says, "a lot of manual work, general knowledge, country specific expertise" would be needed to resolve actual examples. But, coming full circle, without data model and user interface which would allow for each case to be gracefully resolved the potential expertise will have no application.
                                              Last edited by xsergan; Fri January 24th, 2020, 21:02.

                                              Comment


                                              • #24
                                                The NME Singles chart in the UK is another example of EP and albums charting in the singles chart.

                                                One either point is how to handle B-sides, writers, producers, etc. Should they be included? If so, then when you get to CDs the process for album tracks seems more usual to apply. For me, I have each Track with the potential to link to a Track Listing table which can include just the album tracks, or the collected B-sides of CD issues.
                                                http://thechartbook.co.uk - for the latest are best chart book - By Decade!
                                                Now including NME, Record Mirror and Melody Maker from the UK and some Billboard charts

                                                Comment


                                                • #25
                                                  Originally posted by trebor View Post
                                                  ^
                                                  My stance is that a chart generated from a data base should look exactly the same as it was originally released on one specific week.
                                                  Not so much an issue for the USA but countries like Germany, Italy, Spain, etc. will need a heck of footnotes and explanations added.
                                                  this can get messy, even the major us chart sources messed around with punctuation and brevity in titles and artists at times.

                                                  even more so when you start using the more minor sources, like radio charts. Who are famous for brevity sake of truncating titles and artists for space reasons,

                                                  Comment

                                                  Working...
                                                  X