change attributes order

March 29, 2018, 10:59 am

≫ Next: Using decision tree operator to predict patterns of multiple data sets

≪ Previous: Mutation prob of the operator "Optimize Parameters (Evolutionary)"

Hi there

I have a data set with patient_ID and some other attributes as specific diseases that each column includes 1/0

ID Diabetes Hypertension Depression

102 1 0 1

So how can I generate a new attribute that includes all current diseases for each patient_ID such as trhis:

ID Disease

102 Diabetes, Depression

any thought?

Thanks,

Abbas

↧

Using decision tree operator to predict patterns of multiple data sets

March 29, 2018, 12:02 pm

≫ Next: Wrong Credetials

≪ Previous: change attributes order

Dear Sirs/Madams,

I have thousands of datasets, and each data set has occupancy pattern data of an individual household in 5-min interval during two consecutive years. I would like to use the decision tree operator to predict the occupancy patterns using my data sets. However, it seems that the decision tree operator can only be connected with one dataset. May I know how to connect the operator with the multiple data sets please?

Thank you very much.

Best regards,

Lisa

↧

Wrong Credetials

March 29, 2018, 1:04 pm

≫ Next: Two classification models in one process?

≪ Previous: Using decision tree operator to predict patterns of multiple data sets

Hi! I am having trouble with opening RapidMiner, it keeps on showing a message "please enter a valid username and password to access the SOCKS proxy 127.0.0.1:54761" please help!

Thank You!

↧

Two classification models in one process?

March 29, 2018, 3:24 pm

≫ Next: LDA Optimization

≪ Previous: Wrong Credetials

Hello RM Community,

I've been lurking around the forums for quite some time now and I am still learning this new tool. I just wanted to ask if it is possible to run two classification models (e.g. k-nn and decision tree) in one process? If yes, then is it possible to get separate performance results for each?

Sorry if it's a newbie question.

Thanks in advance!

Rem

↧

LDA Optimization

March 29, 2018, 7:59 pm

≫ Next: SPADE algorithm for sequential pattern mining, does RapidMiner has ?

≪ Previous: Two classification models in one process?

Hi guys, Any idea how best to tweak the parameters in optimizing the LDA model? I'm playing about with this example using RSS news feeds and not 100% sure if the Optimize model is working well enough on small values for topics. Is it not enough of a large dataset do you think?

Please note: increasing the start & end values for number of topics gives better results, I'm wanting to build a generic example on usage and any suggested pointers.

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><context><input/><output/><macros/></context><operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process"><process expanded="true"><operator activated="true" class="subprocess" compatibility="8.1.001" expanded="true" height="103" name="Get News Feeds" width="90" x="45" y="34"><process expanded="true"><operator activated="true" class="web:read_rss" compatibility="7.3.000" expanded="true" height="68" name="BBC Top Stories" width="90" x="45" y="34"><parameter key="url" value="http://feeds.bbci.co.uk/news/rss.xml"/></operator><operator activated="true" class="web:read_rss" compatibility="7.3.000" expanded="true" height="68" name="BBC Asia" width="90" x="45" y="85"><parameter key="url" value="http://feeds.bbci.co.uk/news/world/asia/rss.xml"/></operator><operator activated="true" class="web:read_rss" compatibility="7.3.000" expanded="true" height="68" name="BBC Business" width="90" x="45" y="136"><parameter key="url" value="http://feeds.bbci.co.uk/news/business/rss.xml"/></operator><operator activated="true" class="web:read_rss" compatibility="7.3.000" expanded="true" height="68" name="BBC Entertainment" width="90" x="45" y="187"><parameter key="url" value="http://feeds.bbci.co.uk/news/entertainment_and_arts/rss.xml"/></operator><operator activated="true" class="append" compatibility="8.1.001" expanded="true" height="145" name="Append" width="90" x="179" y="34"/><operator activated="true" class="generate_copy" compatibility="8.1.001" expanded="true" height="82" name="Generate Copy" width="90" x="313" y="34"><parameter key="attribute_name" value="Title"/><parameter key="new_name" value="Title2"/></operator><operator activated="true" class="text_to_nominal" compatibility="8.1.001" expanded="true" height="82" name="Text to Nominal" width="90" x="447" y="34"><parameter key="attribute_filter_type" value="subset"/><parameter key="attributes" value="Link|Title2"/><description align="center" color="transparent" colored="false" width="126">Don't convert article link to document text.</description></operator><operator activated="true" class="split_data" compatibility="8.1.001" expanded="true" height="103" name="Split Data" width="90" x="581" y="34"><enumeration key="partitions"><parameter key="ratio" value="0.7"/><parameter key="ratio" value="0.3"/></enumeration><parameter key="sampling_type" value="shuffled sampling"/><description align="center" color="transparent" colored="false" width="126">Randomly sort the data from the feeds. Split into training &amp;amp; testing.</description></operator><connect from_op="BBC Top Stories" from_port="output" to_op="Append" to_port="example set 1"/><connect from_op="BBC Asia" from_port="output" to_op="Append" to_port="example set 2"/><connect from_op="BBC Business" from_port="output" to_op="Append" to_port="example set 3"/><connect from_op="BBC Entertainment" from_port="output" to_op="Append" to_port="example set 4"/><connect from_op="Append" from_port="merged set" to_op="Generate Copy" to_port="example set input"/><connect from_op="Generate Copy" from_port="example set output" to_op="Text to Nominal" to_port="example set input"/><connect from_op="Text to Nominal" from_port="example set output" to_op="Split Data" to_port="example set"/><connect from_op="Split Data" from_port="partition 1" to_port="out 1"/><connect from_op="Split Data" from_port="partition 2" to_port="out 2"/><portSpacing port="source_in 1" spacing="0"/><portSpacing port="sink_out 1" spacing="0"/><portSpacing port="sink_out 2" spacing="0"/><portSpacing port="sink_out 3" spacing="0"/></process></operator><operator activated="true" class="text:data_to_documents" compatibility="8.1.000" expanded="true" height="68" name="Data to Documents (2)" width="90" x="179" y="289"><list key="specify_weights"/></operator><operator activated="true" class="text:data_to_documents" compatibility="8.1.000" expanded="true" height="68" name="Data to Documents" width="90" x="179" y="34"><list key="specify_weights"/></operator><operator activated="true" class="loop_collection" compatibility="8.1.001" expanded="true" height="82" name="Loop Collection" width="90" x="313" y="34"><process expanded="true"><operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="45" y="34"/><operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases" width="90" x="179" y="34"/><operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="313" y="34"><parameter key="min_chars" value="2"/></operator><operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="447" y="34"/><operator activated="true" class="text:generate_n_grams_terms" compatibility="8.1.000" expanded="true" height="68" name="Generate n-Grams (Terms)" width="90" x="581" y="34"/><connect from_port="single" to_op="Tokenize" to_port="document"/><connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/><connect from_op="Transform Cases" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/><connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/><connect from_op="Filter Stopwords (English)" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/><connect from_op="Generate n-Grams (Terms)" from_port="document" to_port="output 1"/><portSpacing port="source_single" spacing="0"/><portSpacing port="sink_output 1" spacing="0"/><portSpacing port="sink_output 2" spacing="0"/></process><description align="center" color="transparent" colored="false" width="126">Text Prep using Text Mining</description></operator><operator activated="true" class="loop_collection" compatibility="8.1.001" expanded="true" height="82" name="Loop Collection (3)" width="90" x="313" y="238"><process expanded="true"><operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize (3)" width="90" x="45" y="34"/><operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="179" y="34"/><operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (3)" width="90" x="313" y="34"><parameter key="min_chars" value="2"/></operator><operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (3)" width="90" x="447" y="34"/><operator activated="true" class="text:generate_n_grams_terms" compatibility="8.1.000" expanded="true" height="68" name="Generate n-Grams (2)" width="90" x="581" y="34"/><connect from_port="single" to_op="Tokenize (3)" to_port="document"/><connect from_op="Tokenize (3)" from_port="document" to_op="Transform Cases (2)" to_port="document"/><connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Tokens (3)" to_port="document"/><connect from_op="Filter Tokens (3)" from_port="document" to_op="Filter Stopwords (3)" to_port="document"/><connect from_op="Filter Stopwords (3)" from_port="document" to_op="Generate n-Grams (2)" to_port="document"/><connect from_op="Generate n-Grams (2)" from_port="document" to_port="output 1"/><portSpacing port="source_single" spacing="0"/><portSpacing port="sink_output 1" spacing="0"/><portSpacing port="sink_output 2" spacing="0"/></process><description align="center" color="transparent" colored="false" width="126">Text Prep using Text Mining</description></operator><operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="8.1.001" expanded="true" height="187" name="Optimize Parameters (Grid)" width="90" x="514" y="34"><list key="parameters"><parameter key="LDA.number_of_topics" value="[5;20;5;linear]"/></list><process expanded="true"><operator activated="true" class="operator_toolbox:lda" compatibility="1.0.000" expanded="true" height="124" name="LDA" width="90" x="179" y="34"><parameter key="number_of_topics" value="20"/><parameter key="iterations" value="100"/><parameter key="use_local_random_seed" value="true"/><parameter key="local_random_seed" value="1997"/></operator><operator activated="true" class="generate_direct_mailing_data" compatibility="8.1.001" expanded="true" height="68" name="Generate Direct Mailing Data" width="90" x="112" y="238"/><operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.001" expanded="true" height="103" name="Decision Tree" width="90" x="246" y="238"><description align="center" color="transparent" colored="false" width="126">This is because the optimize operator doesn't recognize the LDA model type.</description></operator><connect from_port="input 1" to_op="LDA" to_port="col"/><connect from_op="LDA" from_port="exa" to_port="output 1"/><connect from_op="LDA" from_port="top" to_port="output 2"/><connect from_op="LDA" from_port="mod" to_port="output 3"/><connect from_op="LDA" from_port="per" to_port="performance"/><connect from_op="Generate Direct Mailing Data" from_port="output" to_op="Decision Tree" to_port="training set"/><connect from_op="Decision Tree" from_port="model" to_port="model"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="source_input 2" spacing="0"/><portSpacing port="sink_performance" spacing="0"/><portSpacing port="sink_model" spacing="0"/><portSpacing port="sink_output 1" spacing="0"/><portSpacing port="sink_output 2" spacing="0"/><portSpacing port="sink_output 3" spacing="0"/><portSpacing port="sink_output 4" spacing="0"/></process></operator><operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="1.0.000" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="581" y="238"/><connect from_op="Get News Feeds" from_port="out 1" to_op="Data to Documents" to_port="example set"/><connect from_op="Get News Feeds" from_port="out 2" to_op="Data to Documents (2)" to_port="example set"/><connect from_op="Data to Documents (2)" from_port="documents" to_op="Loop Collection (3)" to_port="collection"/><connect from_op="Data to Documents" from_port="documents" to_op="Loop Collection" to_port="collection"/><connect from_op="Loop Collection" from_port="output 1" to_op="Optimize Parameters (Grid)" to_port="input 1"/><connect from_op="Loop Collection (3)" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/><connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 2"/><connect from_op="Optimize Parameters (Grid)" from_port="output 1" to_port="result 3"/><connect from_op="Optimize Parameters (Grid)" from_port="output 2" to_port="result 4"/><connect from_op="Optimize Parameters (Grid)" from_port="output 3" to_op="Apply Model (Documents)" to_port="mod"/><connect from_op="Apply Model (Documents)" from_port="exa" to_port="result 1"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="sink_result 1" spacing="84"/><portSpacing port="sink_result 2" spacing="0"/><portSpacing port="sink_result 3" spacing="0"/><portSpacing port="sink_result 4" spacing="0"/><portSpacing port="sink_result 5" spacing="0"/></process></operator></process>

↧

SPADE algorithm for sequential pattern mining, does RapidMiner has ?

March 20, 2018, 11:56 pm

≫ Next: how to change credentials password

≪ Previous: LDA Optimization

Hi RapidMiner,

SPADE (Sequential Pattern Discovery using Equivalence classes) is another algorithm for sequential pattern mining besides GSP, and CM-SPADE is an improved version of SPADE by using co-occurence information. May i know whether RapidMiner has either of SPADE or CM-SPADE ? If not, will you plan to implement these 2, cos i've seen some articles said SPADE is faster than GSP for large datasets?

Thank you very much for this information.

P/S: below is the links to the papers of SPADE and CM-SPADE:

https://link.springer.com/content/pdf/10.1023/A:1007652502315.pdf

https://www.philippe-fournier-viger.com/spmf/PAKDD2014_sequential_pattern_mining_CM-SPADE_CM-SPAM.pdf

Best Regards,

Phi Vu

↧

how to change credentials password

April 1, 2018, 10:30 am

≫ Next: Problem with Neural Net "use local random seed"

≪ Previous: SPADE algorithm for sequential pattern mining, does RapidMiner has ?

Hello! i would like to know how to change my credentials password, i want to access market place but i cant login because it says either my username or password is wrong, so how can i update my credentials?

Thank you

↧

Problem with Neural Net "use local random seed"

April 1, 2018, 10:41 am

≫ Next: Best practices for text mining an academic text

≪ Previous: how to change credentials password

Hi guys, I have a small trouble.

How is it possible that when the "use local random seed" of Neural Net block is not enabled (unchecked) the same NN process does NOT provide the same results?

What is the purpose of "use local random seed"?

This is my XML process;

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="read_csv" compatibility="8.1.001" expanded="true" height="68" name="Read CSV" width="90" x="112" y="187">
<parameter key="csv_file" value="C:\Users\Admin\Desktop\data Example.csv"/>
<parameter key="column_separators" value=";"/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="&quot;"/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes T" width="90" x="246" y="187">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value="F|D|C|B|A"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="set_role" compatibility="8.1.001" expanded="true" height="82" name="Set Role T" width="90" x="380" y="187">
<parameter key="attribute_name" value="F"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="multiply" compatibility="8.1.001" expanded="true" height="145" name="Multiply Data T" width="90" x="514" y="187"/>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="neural_net" compatibility="8.1.001" expanded="true" height="82" name="Neural Net" width="90" x="715" y="136">
<list key="hidden_layers">
<parameter key="Hidden" value="21"/>
</list>
<parameter key="training_cycles" value="32000"/>
<parameter key="learning_rate" value="0.1"/>
<parameter key="momentum" value="0.1"/>
<parameter key="decay" value="false"/>
<parameter key="shuffle" value="true"/>
<parameter key="normalize" value="true"/>
<parameter key="error_epsilon" value="1.0E-5"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model NN" width="90" x="849" y="136">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="performance_regression" compatibility="8.1.001" expanded="true" height="82" name="Performance NN" width="90" x="983" y="136">
<parameter key="main_criterion" value="first"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="true"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
<parameter key="prediction_average" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="neural_net" compatibility="8.1.001" expanded="true" height="82" name="Neural Net (2)" width="90" x="715" y="238">
<list key="hidden_layers">
<parameter key="Hidden" value="21"/>
</list>
<parameter key="training_cycles" value="32000"/>
<parameter key="learning_rate" value="0.1"/>
<parameter key="momentum" value="0.1"/>
<parameter key="decay" value="false"/>
<parameter key="shuffle" value="true"/>
<parameter key="normalize" value="true"/>
<parameter key="error_epsilon" value="1.0E-5"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model NN (2)" width="90" x="849" y="238">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="performance_regression" compatibility="8.1.001" expanded="true" height="82" name="Performance NN (2)" width="90" x="983" y="238">
<parameter key="main_criterion" value="first"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="true"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
<parameter key="prediction_average" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="neural_net" compatibility="8.1.001" expanded="true" height="82" name="Neural Net (3)" width="90" x="715" y="340">
<list key="hidden_layers">
<parameter key="Hidden" value="21"/>
</list>
<parameter key="training_cycles" value="32000"/>
<parameter key="learning_rate" value="0.1"/>
<parameter key="momentum" value="0.1"/>
<parameter key="decay" value="false"/>
<parameter key="shuffle" value="true"/>
<parameter key="normalize" value="true"/>
<parameter key="error_epsilon" value="1.0E-5"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model NN (3)" width="90" x="849" y="340">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="performance_regression" compatibility="8.1.001" expanded="true" height="82" name="Performance NN (3)" width="90" x="983" y="340">
<parameter key="main_criterion" value="first"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="true"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
<parameter key="prediction_average" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="neural_net" compatibility="8.1.001" expanded="true" height="82" name="Neural Net 21" width="90" x="715" y="442">
<list key="hidden_layers">
<parameter key="Hidden" value="21"/>
</list>
<parameter key="training_cycles" value="32000"/>
<parameter key="learning_rate" value="0.1"/>
<parameter key="momentum" value="0.1"/>
<parameter key="decay" value="false"/>
<parameter key="shuffle" value="true"/>
<parameter key="normalize" value="true"/>
<parameter key="error_epsilon" value="1.0E-5"/>
<parameter key="use_local_random_seed" value="true"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model NN (4)" width="90" x="849" y="442">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="performance_regression" compatibility="8.1.001" expanded="true" height="82" name="Performance NN (4)" width="90" x="983" y="442">
<parameter key="main_criterion" value="first"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="true"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
<parameter key="prediction_average" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
</operator>
</process>

Can anyone help me?
Thanks!!!!

↧

Best practices for text mining an academic text

April 2, 2018, 6:22 am

≫ Next: How to change an information of an attribute?

≪ Previous: Problem with Neural Net "use local random seed"

I have long, complex texts which I want to classify to categories such as psychology, history etc.

What processes would you recommend to use? Eg. tokenization, n-grams etc.

Thank you

↧

How to change an information of an attribute?

April 2, 2018, 10:18 am

≫ Next: Cassandra driver exception for timestamp

≪ Previous: Best practices for text mining an academic text

Hi Experts,

I'm starting to work with rapidminer and I need your help.
I've an database with 23 attributes and I need to change the information of attribute "outcome". This attribute "outcome" has the informations "Lived", "Died" and "Euthanized", in others words, after I read the csv file I'll need treat the database changing the informations of attribute "outcome".
I would like to change all information equal "Euthanized" to "Died". What operetors can I use?

Thanks in advance my friends.

Marcelo Batista

↧

Cassandra driver exception for timestamp

April 2, 2018, 11:43 am

≫ Next: Generate a Random Number (New Number With Each Instance)

≪ Previous: How to change an information of an attribute?

I am trying to connect rapidminer to cassandra 3.1 and I believe I have found a bug. I am using manual queries to test the connection. Most of the queries work OK but reading any timestamp throws an exception.

Cassandra driver Codec not found for requested operation: [timestamp <-> com.datastax.driver.core.LocalDate]

(full stack trace and example table below)

This problem appears to be related to the issue https://datastax-oss.atlassian.net/browse/JAVA-1176 where the Cassandra driver has changed the method calls for timestamps but rapidminer is still using the old method. ( On a related note, it would be very useful if we could inject our own custom codecs for cassandra https://docs.datastax.com/en/developer/java-driver/3.3/manual/custom_codecs/ which would allow developers to better handle custom data types injected into cassandra databases).

Suggested work arounds welcome Smiley Happy

Full details follow:

My cassandra table is defined as;

CREATE TABLE $KEYSPACE$.samples (
context text,
partition int,
resource text,
collected_at timestamp,
metric_name text,
value blob,
attributes map<text, text>,
PRIMARY KEY((context, partition, resource), collected_at, metric_name)
);

I can perform a query which works OK for most columns in the table. e.g.

select context, partition, resource, metric_name from samples;

but if I try to select the timestamp using

select collected_at from samples;

rapidminer throws the following exception

Exception: com.rapidminer.operator.OperatorException
Message: Unknown error. Something went wrong.
Stack trace:
com.rapidminer.extension.nosql.operator.cassandra.ReadCassandraOperator.createExampleSet(ReadCassandraOperator.java:112)
com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:49)
com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:33)
com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:126)
com.rapidminer.operator.Operator.execute(Operator.java:1004)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:77)
com.rapidminer.operator.ExecutionUnit$3.run(ExecutionUnit.java:812)
com.rapidminer.operator.ExecutionUnit$3.run(ExecutionUnit.java:807)
java.security.AccessController.doPrivileged(Native Method)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:807)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:428)
com.rapidminer.operator.Operator.execute(Operator.java:1004)
com.rapidminer.Process.execute(Process.java:1310)
com.rapidminer.Process.run(Process.java:1285)
com.rapidminer.Process.run(Process.java:1176)
com.rapidminer.Process.run(Process.java:1129)
com.rapidminer.Process.run(Process.java:1124)
com.rapidminer.Process.run(Process.java:1114)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:65)

Cause
Exception: com.datastax.driver.core.exceptions.CodecNotFoundException
Message: Codec not found for requested operation: [timestamp <-> com.datastax.driver.core.LocalDate]
Stack trace:

com.datastax.driver.core.CodecRegistry.notFound(CodecRegistry.java:741)
com.datastax.driver.core.CodecRegistry.createCodec(CodecRegistry.java:588)
com.datastax.driver.core.CodecRegistry.access$500(CodecRegistry.java:137)
com.datastax.driver.core.CodecRegistry$TypeCodecCacheLoader.load(CodecRegistry.java:246)
com.datastax.driver.core.CodecRegistry$TypeCodecCacheLoader.load(CodecRegistry.java:232)
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323)
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
com.google.common.cache.LocalCache.get(LocalCache.java:3953)
com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
com.datastax.driver.core.CodecRegistry.lookupCodec(CodecRegistry.java:522)
com.datastax.driver.core.CodecRegistry.codecFor(CodecRegistry.java:485)
com.datastax.driver.core.CodecRegistry.codecFor(CodecRegistry.java:467)
com.datastax.driver.core.AbstractGettableByIndexData.codecFor(AbstractGettableByIndexData.java:69)
com.datastax.driver.core.AbstractGettableByIndexData.getDate(AbstractGettableByIndexData.java:174)
com.datastax.driver.core.AbstractGettableData.getDate(AbstractGettableData.java:26)
com.datastax.driver.core.AbstractGettableData.getDate(AbstractGettableData.java:111)
com.rapidminer.extension.nosql.operator.cassandra.ReadCassandraOperator.createExampleSet(ReadCassandraOperator.java:162)
com.rapidminer.extension.nosql.operator.cassandra.ReadCassandraOperator.createExampleSet(ReadCassandraOperator.java:97)
com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:49)
com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:33)
com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:126)
com.rapidminer.operator.Operator.execute(Operator.java:1004)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:77)
com.rapidminer.operator.ExecutionUnit$3.run(ExecutionUnit.java:812)
com.rapidminer.operator.ExecutionUnit$3.run(ExecutionUnit.java:807)
java.security.AccessController.doPrivileged(Native Method)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:807)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:428)
com.rapidminer.operator.Operator.execute(Operator.java:1004)
com.rapidminer.Process.execute(Process.java:1310)
com.rapidminer.Process.run(Process.java:1285)
com.rapidminer.Process.run(Process.java:1176)
com.rapidminer.Process.run(Process.java:1129)
com.rapidminer.Process.run(Process.java:1124)
com.rapidminer.Process.run(Process.java:1114)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:65)

↧

Generate a Random Number (New Number With Each Instance)

March 21, 2018, 8:05 am

≫ Next: Looping through an example to generate more examples

≪ Previous: Cassandra driver exception for timestamp

Hi there

I'm working on a process that creates two random numbers. The following ways have been tested already:

I used the "Generate Data" operator and tried to use both the attributes as well the label
I used the "Generate Attributes" rand()

I got to a positive result with each scenario. My result is showing me two numbers the way I want, alright. However, if I run the process a second time the numbers do not change (no random new number is generated) - the same numbers are the output of every new instance of the process.

What operator or work-around is to be used in order to get new random numbers for each time I run the process anew?

Many thanks in advance!

Roman

↧

Looping through an example to generate more examples

March 21, 2018, 12:15 pm

≫ Next: Loop Files Operator runs forever

≪ Previous: Generate a Random Number (New Number With Each Instance)

My data is confidential I cannot post it.

Here is an example of what my data looks like:

I would like to transform it into:

Please help me do this.

↧

Loop Files Operator runs forever

March 21, 2018, 2:20 pm

≫ Next: How to work with a very large .csv file?

≪ Previous: Looping through an example to generate more examples

Hi All,

Trying to use loop file operator on TripAdvisor data from process mentioned here: https://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/Synonym-Detection-with-Word2Vec/ta-p/43860.

However, even with 5 files in the directory, the process is stuck at loop files operator. Any suggestions on what's wrong?

Thanks!

↧

How to work with a very large .csv file?

April 4, 2018, 5:21 pm

≫ Next: No difference between "polynominal" and "text" data type?

≪ Previous: Loop Files Operator runs forever

Hi folks, it looks like there has been a similar post or two in the past, but years old at this point so I thought it would be helpful to refresh...

I need to load a huge .csv file (4.72GB, ~23MM lines), and I need to break it up into smaller .csv files according to one polynomial attribute. This is public State of Texas data, so the attribute by which I want to split into smaller data sets is "County", and I want those new .csv files to be kicked out onto my local disk.

What's the best, most computationally efficient way to do this?

Thanks!!

↧

No difference between "polynominal" and "text" data type?

April 5, 2018, 7:58 am

≫ Next: exploratry data analysis on data using chi-square test

≪ Previous: How to work with a very large .csv file?

Hi,

I checked the manual and this forum but I cannot find the answer.

Is there possibly no difference between the "text" and "polynominal" data types in RM?

I am asking, because I worked with quite an large data set and RM seems to save text variables the same way it does with polynominals: as categorical variables (or factors).

The metadata file gets really huge and this slows down RM a lot when loading and handling data.

Could this be true?

Best

Karl

↧

exploratry data analysis on data using chi-square test

April 5, 2018, 12:00 pm

≫ Next: WEB MİNİGN

≪ Previous: No difference between "polynominal" and "text" data type?

i m getting trouble to find the chi squar test on the dataset. my dataset has real or polynomial values.

here is xml process.

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve house_prices" width="90" x="45" y="34"><parameter key="repository_entry" value="//Local Repository/house_prices"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="weka:W-ChiSquaredAttributeEval" compatibility="7.3.000" expanded="true" height="82" name="W-ChiSquaredAttributeEval" width="90" x="313" y="34"><parameter key="normalize_weights" value="false"/><parameter key="sort_weights" value="true"/><parameter key="sort_direction" value="ascending"/><parameter key="M" value="false"/><parameter key="B" value="false"/></operator></process>

can anyone help me please?

↧

WEB MİNİGN

April 6, 2018, 8:41 am

≫ Next: Convert Confidence Values to Regular Attribute

≪ Previous: exploratry data analysis on data using chi-square test

hi everybody;

Im new on web mining with rapid miner , and i have a project that have to do , please help me !!!

this link is finance yahoo ; https://finance.yahoo.com/quote/TSLA?p=TSLA

i just wanna get this site and learn tesla stocks , positive or negative .

ı have to use this result in my rapid miner project , but like i said , im newon web mining with rapid miner ,

I dont know ho to do , please show me a simple rapid miner project about my problem

↧

Convert Confidence Values to Regular Attribute

April 6, 2018, 12:57 pm

≫ Next: Creating history of stock balace

≪ Previous: WEB MİNİGN

I am scoring data and get results in a column called "confidence". Easy enough, but this result data is hidden from use in downstream processes - cannot see the column in generate attributes, set role, select attributes, etc. The only place I see the output is in the yellow results column. Is there an operator to retrieve this confidence data and use it in downstream process steps? I can export to excel and reimport but this seems silly. thanks, Mike

↧

Creating history of stock balace

April 8, 2018, 4:50 am

≫ Next: How do I load the data stored by a store operator outside RapidMiner?

≪ Previous: Convert Confidence Values to Regular Attribute

Hi guys,

Hope you are doing well.

I have been trying to recreate a history of stock balance, but so far, even consulting some people with good experience in rapidminer, I haven’t been able to do what I need. Maybe you guys would be able to help me.

Below is one example set. Basically, I have initial stock on 02/04. Based on despatches and receipts of the previous days I need to calculate the initial stock for 01/04 and 31/03… all the processes I have tried would work with one product code only, but when the date changes to the next product code it gives incorrect values.

Code	Initial Stock on 02/04	Trx Data	Trx Time	Trx Type	Adjust
SAN01	100	01/04/2018	21:00	Despatched	-10
SAN01	100	01/04/2018	20:00	Received	5
SAN01	100	01/04/2018	19:00	Despatched	-7
SAN01	100	31/03/2018	17:00	Despatched	-6
SAN01	100	31/03/2018	16:00	Despatched	-3
SAN01	100	31/03/2018	15:00	Received	7
SAN02	120	01/04/2018	21:00	Despatched	-2
SAN02	120	01/04/2018	20:00	Received	3
SAN02	120	01/04/2018	19:00	Despatched	-5
SAN02	120	31/03/2018	17:00	Despatched	-6
SAN02	120	31/03/2018	16:00	Despatched	-2
SAN02	120	31/03/2018	15:00	Received	6

Below is what I expect to get.

SAN01	Date	Initial Stock
SAN01	01/04/2018	88
SAN01	01/03/2018	86
SAN02	01/04/2018	116
SAN02	01/03/2018	114

thank you all in advance.

↧